Update README.md
Browse files
README.md
CHANGED
@@ -1,66 +1,48 @@
|
|
1 |
---
|
2 |
-
license:
|
3 |
-
|
4 |
-
-
|
5 |
-
|
6 |
-
-
|
7 |
-
-
|
8 |
-
|
9 |
-
-
|
10 |
-
|
|
|
11 |
---
|
12 |
|
13 |
-
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
14 |
-
should probably proofread and complete it, then remove this comment. -->
|
15 |
-
|
16 |
# luke-large-defamation-detection-japanese
|
|
|
17 |
|
18 |
-
This model is a fine-tuned version of [studio-ousia/luke-japanese-large](https://huggingface.co/studio-ousia/luke-japanese-large)
|
19 |
-
It achieves the following results on the evaluation set:
|
20 |
-
- Loss: 0.4430
|
21 |
-
- Accuracy: 0.6616
|
22 |
-
- F1: 0.6381
|
23 |
-
- Auc: 0.8630
|
24 |
-
|
25 |
-
## Model description
|
26 |
-
|
27 |
-
More information needed
|
28 |
-
|
29 |
-
## Intended uses & limitations
|
30 |
-
|
31 |
-
More information needed
|
32 |
-
|
33 |
-
## Training and evaluation data
|
34 |
-
|
35 |
-
More information needed
|
36 |
-
|
37 |
-
## Training procedure
|
38 |
|
39 |
-
|
|
|
|
|
40 |
|
41 |
-
|
42 |
-
|
43 |
-
|
44 |
-
|
45 |
-
|
46 |
-
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
47 |
-
- lr_scheduler_type: cosine
|
48 |
-
- num_epochs: 4
|
49 |
-
- mixed_precision_training: Native AMP
|
50 |
|
51 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
52 |
|
53 |
-
| Training Loss | Epoch | Step | Validation Loss | Accuracy | F1 | Auc |
|
54 |
-
|:-------------:|:-----:|:----:|:---------------:|:--------:|:------:|:------:|
|
55 |
-
| 0.4219 | 1.0 | 1780 | 0.3979 | 0.6630 | 0.6084 | 0.8466 |
|
56 |
-
| 0.3375 | 2.0 | 3560 | 0.4050 | 0.6706 | 0.6242 | 0.8618 |
|
57 |
-
| 0.2716 | 3.0 | 5340 | 0.4362 | 0.6595 | 0.6370 | 0.8626 |
|
58 |
-
| 0.2331 | 4.0 | 7120 | 0.4430 | 0.6616 | 0.6381 | 0.8630 |
|
59 |
|
|
|
60 |
|
61 |
-
|
62 |
|
63 |
-
-
|
64 |
-
- Pytorch 1.13.1+cu116
|
65 |
-
- Datasets 2.8.0
|
66 |
-
- Tokenizers 0.13.2
|
|
|
1 |
---
|
2 |
+
license: cc-by-sa-4.0
|
3 |
+
datasets:
|
4 |
+
- kubota/defamation-japanese-twitter
|
5 |
+
language:
|
6 |
+
- ja
|
7 |
+
pipeline_tag: text-classification
|
8 |
+
widget:
|
9 |
+
- text: お前のことを殺すぞ
|
10 |
+
- text: 本当に不細工だなぁ
|
11 |
+
- text: あの人は殺人を犯した犯罪者らしい
|
12 |
---
|
13 |
|
|
|
|
|
|
|
14 |
# luke-large-defamation-detection-japanese
|
15 |
+
# 日本語誹謗中傷検出器
|
16 |
|
17 |
+
This model is a fine-tuned version of [studio-ousia/luke-japanese-large](https://huggingface.co/studio-ousia/luke-japanese-large) for the Japanese language finetuned for automatic defamation detection.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
18 |
|
19 |
+
The original foundation model was finetuned on a balanced dataset created by unifying two datasets:
|
20 |
+
- [![Generic badge](https://img.shields.io/badge/Dataset-DefamationJapaneseTwitter-red.svg)](https://huggingface.co/datasets/kubota/defamation-japanese-twitter)
|
21 |
+
- `DefamationJapaneseYouTube` : TBA
|
22 |
|
23 |
+
<b>Labels</b>:\
|
24 |
+
0 -> "中傷性のない発言"\
|
25 |
+
1 -> "脅迫的な発言"\
|
26 |
+
2 -> "侮蔑的な発言"\
|
27 |
+
3"-> "名誉を低下させる発言"
|
|
|
|
|
|
|
|
|
28 |
|
29 |
+
## Example Pipeline
|
30 |
+
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/kubotaissei/defamation_japanese_twitter/blob/master/notebooks/pipeline_example.ipynb)
|
31 |
+
```python
|
32 |
+
# !pip install transformers==4.26 sentencepiece
|
33 |
+
from transformers import pipeline
|
34 |
+
pipe = pipeline(model="kubota/luke-large-defamation-detection-japanese")
|
35 |
+
pipe("あの人は殺人を犯した犯罪者らしい")
|
36 |
+
```
|
37 |
+
```
|
38 |
+
[{'label': '名誉を低下させる発言', 'score': 0.8889994621276855}]
|
39 |
+
```
|
40 |
+
## Training Scripts
|
41 |
+
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/kubotaissei/defamation_japanese_twitter/blob/master/notebooks/train_example.ipynb)
|
42 |
|
|
|
|
|
|
|
|
|
|
|
|
|
43 |
|
44 |
+
## Licenses
|
45 |
|
46 |
+
The finetuned model with all attached files is licensed under [CC BY-SA 4.0](http://creativecommons.org/licenses/by-sa/4.0/), or Creative Commons Attribution-ShareAlike 4.0 International License.
|
47 |
|
48 |
+
<a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a>
|
|
|
|
|
|