Update README.md
Browse files
README.md
CHANGED
@@ -29,6 +29,8 @@ In addition to translation between Japanese and English, the model also has the
|
|
29 |
Meta社の200言語以上の翻訳に対応した超多言語対応機械翻訳モデルNLLB-200シリーズと比較したベンチマーク結果は以下です。
|
30 |
Benchmark results compared to Meta's NLLB-200 series of super multilingual machine translation models, which support translations in over 200 languages, are shown below.
|
31 |
|
|
|
|
|
32 |
| Model Name | file size |E->J chrf++/F2|E->J comet|J->E chrf++/F2|J->E comet |
|
33 |
|------------------------------|-----------|--------------|----------|--------------|-----------|
|
34 |
| NLLB-200-Distilled | 2.46GB | 23.6/- | - | 50.2/- | - |
|
@@ -37,22 +39,26 @@ Benchmark results compared to Meta's NLLB-200 series of super multilingual machi
|
|
37 |
| NLLB-200 | 17.58GB | 25.2/- | - | 55.1/- | - |
|
38 |
| NLLB-200 | 220.18GB | 27.9/33.2 | 0.8908 | 55.8/59.8 | 0.8792 |
|
39 |
|
40 |
-
previous our model(ALMA-7B-Ja)
|
41 |
| Model Name | file size |E->J chrf++/F2|E->J comet|J->E chrf++/F2|J->E comet |
|
|
|
42 |
| webbigdata-ALMA-7B-Ja-q4_K_S | 3.6GB | -/24.2 | 0.8210 | -/54.2 | 0.8559 |
|
43 |
| ALMA-7B-Ja-GPTQ-Ja-En | 3.9GB | -/30.8 | 0.8743 | -/60.9 | 0.8743 |
|
44 |
| ALMA-Ja(Ours) | 13.48GB | -/31.8 | 0.8811 | -/61.6 | 0.8773 |
|
45 |
|
46 |
-
ALMA-7B-Ja-V2
|
|
|
|
|
47 |
| ALMA-7B-Ja-V2-GPTQ-Ja-En | 3.9GB | -/33.0 | 0.8818 | -/62.0 | 0.8774 |
|
48 |
| ALMA-Ja-V2(Ours) | 13.48GB | -/33.9 | 0.8820 | -/63.1 | 0.8873 |
|
49 |
| ALMA-Ja-V2-Lora(Ours) | 13.48GB | -/33.7 | 0.8843 | -/61.1 | 0.8775 |
|
50 |
|
51 |
|
|
|
52 |
様々なジャンルの文章を実際のアプリケーションと比較した結果は以下です。
|
53 |
Here are the results of a comparison of various genres of writing with the actual application.
|
54 |
|
55 |
-
政府の公式文章 Government Official Announcements
|
56 |
| |e->j chrF2++|e->j BLEU|e->j comet|j->e chrF2++|j->e BLEU|j->e comet|
|
57 |
|--------------------------|------------|---------|----------|------------|---------|----------|
|
58 |
| ALMA-7B-Ja-V2-GPTQ-Ja-En | 25.3 | 15.00 | 0.8848 | 60.3 | 26.82 | 0.6189 |
|
@@ -63,7 +69,7 @@ Here are the results of a comparison of various genres of writing with the actua
|
|
63 |
| google-translate | 43.5 | 35.37 | 0.9181 | 62.7 | 29.22 | 0.6446 |
|
64 |
| deepl | 43.5 | 35.74 | 0.9301 | 60.1 | 27.40 | 0.6389 |
|
65 |
|
66 |
-
二次創作 Fanfiction
|
67 |
| |e->j chrF2++|e->j BLEU|e->j comet|j->e chrF2++|j->e BLEU|j->e comet|
|
68 |
|--------------------------|------------|---------|----------|------------|---------|----------|
|
69 |
| ALMA-7B-Ja-V2-GPTQ-Ja-En | 27.6 | 18.28 | 0.8643 | 52.1 | 24.58 | 0.6106 |
|
@@ -75,21 +81,20 @@ Here are the results of a comparison of various genres of writing with the actua
|
|
75 |
| deepl | 33.5 | 28.38 | 0.9094 | 60.0 | 31.14 | 0.6124 |
|
76 |
|
77 |
|
78 |
-
[Sample Code For Free Colab](https://github.com/webbigdata-jp/python_sample/blob/main/
|
79 |
|
80 |
|
81 |
|
82 |
## Other Version
|
83 |
|
84 |
-
### ALMA-7B-Ja-V2
|
85 |
GPTQ is quantized(reduce the size of the model) method and ALMA-7B-Ja-V2-GPTQ has GPTQ quantized version that reduces model size(3.9GB) and memory usage.
|
86 |
But the performance is probably lower. And translation ability for languages other than Japanese and English has deteriorated significantly.
|
87 |
|
88 |
-
[Sample Code For Free Colab webbigdata/ALMA-7B-Ja-V2-GPTQ-Ja-En](https://
|
89 |
|
90 |
If you want to translate the entire file at once, try Colab below.
|
91 |
-
[ALMA_7B_Ja_GPTQ_Ja_En_batch_translation_sample](https://github.com/webbigdata-jp/
|
92 |
-
|
93 |
|
94 |
|
95 |
**ALMA** (**A**dvanced **L**anguage **M**odel-based tr**A**nslator) is an LLM-based translation model, which adopts a new translation model paradigm: it begins with fine-tuning on monolingual data and is further optimized using high-quality parallel data. This two-step fine-tuning process ensures strong translation performance.
|
@@ -110,6 +115,5 @@ Original Model [ALMA-7B](https://huggingface.co/haoranxu/ALMA-7B). (26.95GB)
|
|
110 |
Prevous Model [ALMA-7B-Ja](https://huggingface.co/webbigdata/ALMA-7B-Ja). (13.3 GB)
|
111 |
|
112 |
|
113 |
-
|
114 |
## about this work
|
115 |
- **This work was done by :** [webbigdata](https://webbigdata.jp/).
|
|
|
29 |
Meta社の200言語以上の翻訳に対応した超多言語対応機械翻訳モデルNLLB-200シリーズと比較したベンチマーク結果は以下です。
|
30 |
Benchmark results compared to Meta's NLLB-200 series of super multilingual machine translation models, which support translations in over 200 languages, are shown below.
|
31 |
|
32 |
+
|
33 |
+
## NLLB-200
|
34 |
| Model Name | file size |E->J chrf++/F2|E->J comet|J->E chrf++/F2|J->E comet |
|
35 |
|------------------------------|-----------|--------------|----------|--------------|-----------|
|
36 |
| NLLB-200-Distilled | 2.46GB | 23.6/- | - | 50.2/- | - |
|
|
|
39 |
| NLLB-200 | 17.58GB | 25.2/- | - | 55.1/- | - |
|
40 |
| NLLB-200 | 220.18GB | 27.9/33.2 | 0.8908 | 55.8/59.8 | 0.8792 |
|
41 |
|
42 |
+
## previous our model(ALMA-7B-Ja)
|
43 |
| Model Name | file size |E->J chrf++/F2|E->J comet|J->E chrf++/F2|J->E comet |
|
44 |
+
|------------------------------|-----------|--------------|----------|--------------|-----------|
|
45 |
| webbigdata-ALMA-7B-Ja-q4_K_S | 3.6GB | -/24.2 | 0.8210 | -/54.2 | 0.8559 |
|
46 |
| ALMA-7B-Ja-GPTQ-Ja-En | 3.9GB | -/30.8 | 0.8743 | -/60.9 | 0.8743 |
|
47 |
| ALMA-Ja(Ours) | 13.48GB | -/31.8 | 0.8811 | -/61.6 | 0.8773 |
|
48 |
|
49 |
+
## ALMA-7B-Ja-V2
|
50 |
+
| Model Name | file size |E->J chrf++/F2|E->J comet|J->E chrf++/F2|J->E comet |
|
51 |
+
|------------------------------|-----------|--------------|----------|--------------|-----------|
|
52 |
| ALMA-7B-Ja-V2-GPTQ-Ja-En | 3.9GB | -/33.0 | 0.8818 | -/62.0 | 0.8774 |
|
53 |
| ALMA-Ja-V2(Ours) | 13.48GB | -/33.9 | 0.8820 | -/63.1 | 0.8873 |
|
54 |
| ALMA-Ja-V2-Lora(Ours) | 13.48GB | -/33.7 | 0.8843 | -/61.1 | 0.8775 |
|
55 |
|
56 |
|
57 |
+
|
58 |
様々なジャンルの文章を実際のアプリケーションと比較した結果は以下です。
|
59 |
Here are the results of a comparison of various genres of writing with the actual application.
|
60 |
|
61 |
+
## 政府の公式文章 Government Official Announcements
|
62 |
| |e->j chrF2++|e->j BLEU|e->j comet|j->e chrF2++|j->e BLEU|j->e comet|
|
63 |
|--------------------------|------------|---------|----------|------------|---------|----------|
|
64 |
| ALMA-7B-Ja-V2-GPTQ-Ja-En | 25.3 | 15.00 | 0.8848 | 60.3 | 26.82 | 0.6189 |
|
|
|
69 |
| google-translate | 43.5 | 35.37 | 0.9181 | 62.7 | 29.22 | 0.6446 |
|
70 |
| deepl | 43.5 | 35.74 | 0.9301 | 60.1 | 27.40 | 0.6389 |
|
71 |
|
72 |
+
## 二次創作 Fanfiction
|
73 |
| |e->j chrF2++|e->j BLEU|e->j comet|j->e chrF2++|j->e BLEU|j->e comet|
|
74 |
|--------------------------|------------|---------|----------|------------|---------|----------|
|
75 |
| ALMA-7B-Ja-V2-GPTQ-Ja-En | 27.6 | 18.28 | 0.8643 | 52.1 | 24.58 | 0.6106 |
|
|
|
81 |
| deepl | 33.5 | 28.38 | 0.9094 | 60.0 | 31.14 | 0.6124 |
|
82 |
|
83 |
|
84 |
+
[Sample Code For Free Colab](https://github.com/webbigdata-jp/python_sample/blob/main/ALMA_7B_Ja_V2_Free_Colab_sample.ipynb)
|
85 |
|
86 |
|
87 |
|
88 |
## Other Version
|
89 |
|
90 |
+
### ALMA-7B-Ja-V2-GPTQ-Ja-En
|
91 |
GPTQ is quantized(reduce the size of the model) method and ALMA-7B-Ja-V2-GPTQ has GPTQ quantized version that reduces model size(3.9GB) and memory usage.
|
92 |
But the performance is probably lower. And translation ability for languages other than Japanese and English has deteriorated significantly.
|
93 |
|
94 |
+
[Sample Code For Free Colab webbigdata/ALMA-7B-Ja-V2-GPTQ-Ja-En](https://github.com/webbigdata-jp/ALMA/blob/master/ALMA_7B_Ja_V2_GPTQ_Ja_En_Free_Colab_sample.ipynb)
|
95 |
|
96 |
If you want to translate the entire file at once, try Colab below.
|
97 |
+
[ALMA_7B_Ja_GPTQ_Ja_En_batch_translation_sample](https://github.com/webbigdata-jp/ALMA/blob/master/ALMA_7B_Ja_V2_GPTQ_Ja_En_batch_translation_sample.ipynb)
|
|
|
98 |
|
99 |
|
100 |
**ALMA** (**A**dvanced **L**anguage **M**odel-based tr**A**nslator) is an LLM-based translation model, which adopts a new translation model paradigm: it begins with fine-tuning on monolingual data and is further optimized using high-quality parallel data. This two-step fine-tuning process ensures strong translation performance.
|
|
|
115 |
Prevous Model [ALMA-7B-Ja](https://huggingface.co/webbigdata/ALMA-7B-Ja). (13.3 GB)
|
116 |
|
117 |
|
|
|
118 |
## about this work
|
119 |
- **This work was done by :** [webbigdata](https://webbigdata.jp/).
|