File size: 3,279 Bytes
123b709
 
387989b
97f7b96
8a46630
 
97f7b96
 
 
 
387989b
97f7b96
2a53f07
8a46630
97f7b96
 
8a46630
97f7b96
 
d80d276
17bf5d3
7b248b3
 
123b709
8a46630
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
088d917
8a46630
088d917
 
 
 
 
 
6f10f0f
31c3d9d
088d917
6f10f0f
0082f29
6f10f0f
088d917
 
 
 
facf1bc
 
088d917
facf1bc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
088d917
8a46630
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d80d276
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
---
license: mit
language:
 - ja
base_model: microsoft/mdeberta-v3-base
tags:
 - generated_from_trainer
 - bert
 - zero-shot-classification
 - text-classification
datasets:
 - MoritzLaurer/multilingual-NLI-26lang-2mil7
 - shunk031/JGLUE
metrics:
 - accuracy
 - f1
model-index:
 - name: mDeBERTa-v3-base-finetuned-nli-jnli
   results: []
pipeline_tag: zero-shot-classification
widget:
- text: 今日の予定を教えて
  candidate_labels: 天気,ニュース,金融,予定
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# mDeBERTa-v3-base-finetuned-nli-jnli

This model is a fine-tuned version of [microsoft/mdeberta-v3-base](https://huggingface.co/microsoft/mdeberta-v3-base) on the None dataset.
It achieves the following results on the evaluation set:
- Loss: 0.7739
- Accuracy: 0.6808
- F1: 0.6742

## Model description

More information needed

## Intended uses & limitations
#### zero-shot classification

```python
from transformers import pipeline

model_name = "thkkvui/mDeBERTa-v3-base-finetuned-nli-jnli"
classifier = pipeline("zero-shot-classification", model=model_name)

text = ["今日の天気を教えて", "ニュースある?", "予定をチェックして", "ドル円は?"]
labels = ["天気", "ニュース", "金融", "予定"]

for t in text:
    output = classifier(t, labels, multi_label=False)
    print(output)
```

#### NLI use-case
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

device = torch.device("mps" if torch.backends.mps.is_available() else "cpu")
model_name = "thkkvui/mDeBERTa-v3-base-finetuned-nli-jnli"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

premise = "NY Yankees is the professional baseball team in America."
hypothesis = "メジャーリーグのチームは、日本ではニューヨークヤンキースが有名だ。"

inputs = tokenizer(premise, hypothesis, truncation=True, return_tensors="pt")

with torch.no_grad():
    output = model(**inputs)
    
preds = torch.softmax(output["logits"][0], -1).tolist()
label_names = ["entailment", "neutral", "contradiction"]
result = {name: round(float(pred) * 100, 1) for pred, name in zip(preds, label_names)}
print(result)
```

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 3e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.06
- num_epochs: 2

### Training results

| Training Loss | Epoch | Step  | Validation Loss | Accuracy | F1     |
|:-------------:|:-----:|:-----:|:---------------:|:--------:|:------:|
| 0.753         | 0.53  | 5000  | 0.8758          | 0.6105   | 0.6192 |
| 0.5947        | 1.07  | 10000 | 0.6619          | 0.7054   | 0.7035 |
| 0.5791        | 1.6   | 15000 | 0.7739          | 0.6808   | 0.6742 |


### Framework versions

- Transformers 4.33.2
- Pytorch 2.0.1
- Datasets 2.14.5
- Tokenizers 0.13.3