Update README.md
Browse files
README.md
CHANGED
@@ -10,14 +10,14 @@ language:
|
|
10 |
---
|
11 |
# webbigdata/ALMA-7B-Ja-V2
|
12 |
|
13 |
-
ALMA-7B-Ja-V2は日本語から英語、英語から日本語の翻訳が可能な機械翻訳モデルです。
|
14 |
The ALMA-7B-Ja-V2 is a machine translation model capable of translating from Japanese to English and English to Japanese.
|
15 |
|
16 |
ALMA-7B-Ja-V2は以前のモデル(ALMA-7B-Ja)に更に学習を追加し、性能を向上しています。
|
17 |
The ALMA-7B-Ja-V2 adds further learning to the previous model (ALMA-7B-Ja) and improves performance.
|
18 |
|
19 |
-
|
20 |
-
In addition to translation between Japanese and English,
|
21 |
|
22 |
- ドイツ語 German(de) and 英語 English(en)
|
23 |
- 中国語 Chinese(zh) and 英語 English(en)
|
@@ -30,19 +30,19 @@ In addition to translation between Japanese and English, the model also has the
|
|
30 |
The following three metrics were used to check translation performance.
|
31 |
|
32 |
数字は多いほど性能が良い事を表します。
|
33 |
-
The higher the number, the better the performance.
|
34 |
|
35 |
## BLEU
|
36 |
-
翻訳テキストが元のテキストにどれだけ似ているかを評価する指標。しかし、単語の出現頻度だけを見ているため、語順の正確さや文の流暢さを十分に評価できないという弱点があります
|
37 |
-
A metric that evaluates how similar the translated text is to the original text. However, since it mainly looks at the frequency of word appearances, it may not effectively evaluate the accuracy of word order or the fluency of sentences.
|
38 |
|
39 |
### chrF++
|
40 |
-
文字の組み合わせの一致度と語順に基づいて、翻訳の正確さを評価する方法。弱点としては、長い文の評価には不向きであることが挙げられます。
|
41 |
-
A method to evaluate translation accuracy based on how well character combinations match and the order of words. A drawback is that it might not be suitable for evaluating longer sentences.
|
42 |
|
43 |
### comet
|
44 |
-
機械学習モデルを使って翻訳の品質を自動的に評価するためのツール。機械学習ベースであるため、元々のモデルが学習に使ったデータに大きく依存するという弱点があります。
|
45 |
-
A tool where a computer automatically assesses the quality of a translation. Being machine learning-based, it has the drawback of being heavily dependent on the training data it was provided.
|
46 |
|
47 |
## vs. NLLB-200
|
48 |
Meta社の200言語以上の翻訳に対応した超多言語対応機械翻訳モデルNLLB-200シリーズと比較したベンチマーク結果は以下です。
|
@@ -56,14 +56,14 @@ Benchmark results compared to Meta's NLLB-200 series of super multilingual machi
|
|
56 |
| NLLB-200 | 17.58GB | 25.2/- | - | 55.1/- | - |
|
57 |
| NLLB-200 | 220.18GB | 27.9/33.2 | 0.8908 | 55.8/59.8 | 0.8792 |
|
58 |
|
59 |
-
## previous our model(ALMA-7B-Ja)
|
60 |
| Model Name | file size |E->J chrf++/F2|E->J comet|J->E chrf++/F2|J->E comet |
|
61 |
|------------------------------|-----------|--------------|----------|--------------|-----------|
|
62 |
| webbigdata-ALMA-7B-Ja-q4_K_S | 3.6GB | -/24.2 | 0.8210 | -/54.2 | 0.8559 |
|
63 |
| ALMA-7B-Ja-GPTQ-Ja-En | 3.9GB | -/30.8 | 0.8743 | -/60.9 | 0.8743 |
|
64 |
| ALMA-Ja(Ours) | 13.48GB | -/31.8 | 0.8811 | -/61.6 | 0.8773 |
|
65 |
|
66 |
-
## ALMA-7B-Ja-V2
|
67 |
| Model Name | file size |E->J chrf++/F2|E->J comet|J->E chrf++/F2|J->E comet |
|
68 |
|------------------------------|-----------|--------------|----------|--------------|-----------|
|
69 |
| ALMA-7B-Ja-V2-GPTQ-Ja-En | 3.9GB | -/33.0 | 0.8818 | -/62.0 | 0.8774 |
|
@@ -74,7 +74,7 @@ Benchmark results compared to Meta's NLLB-200 series of super multilingual machi
|
|
74 |
ALMA-7B-Ja-V2を様々なジャンルの文章を現実世界のアプリケーションと比較した結果は以下です。
|
75 |
Here are the results of a comparison of various genres of writing with the actual application.
|
76 |
|
77 |
-
## 政府の公式文章 Government Official Announcements
|
78 |
| |e->j chrF2++|e->j BLEU|e->j comet|j->e chrF2++|j->e BLEU|j->e comet|
|
79 |
|--------------------------|------------|---------|----------|------------|---------|----------|
|
80 |
| ALMA-7B-Ja-V2-GPTQ-Ja-En | 25.3 | 15.00 | 0.8848 | 60.3 | 26.82 | 0.6189 |
|
@@ -85,7 +85,7 @@ Here are the results of a comparison of various genres of writing with the actua
|
|
85 |
| google-translate | 43.5 | 35.37 | 0.9181 | 62.7 | 29.22 | 0.6446 |
|
86 |
| deepl | 43.5 | 35.74 | 0.9301 | 60.1 | 27.40 | 0.6389 |
|
87 |
|
88 |
-
## 古典文学 Classical Literature
|
89 |
| |e->j chrF2++|e->j BLEU|e->j comet|j->e chrF2++|j->e BLEU|j->e comet|
|
90 |
|--------------------------|------------|---------|----------|------------|---------|----------|
|
91 |
| ALMA-7B-Ja-V2-GPTQ-Ja-En | 11.8 | 7.24 | 0.6943 | 31.9 | 9.71 | 0.5617 |
|
@@ -96,7 +96,7 @@ Here are the results of a comparison of various genres of writing with the actua
|
|
96 |
| deepl | 14.4 | 9.18 | 0.7149 | 34.6 | 10.68 | 0.5787 |
|
97 |
| google-translate | 13.5 | 8.57 | 0.7432 | 31.7 | 7.94 | 0.5856 |
|
98 |
|
99 |
-
## 二次創作 Fanfiction
|
100 |
| |e->j chrF2++|e->j BLEU|e->j comet|j->e chrF2++|j->e BLEU|j->e comet|
|
101 |
|--------------------------|------------|---------|----------|------------|---------|----------|
|
102 |
| ALMA-7B-Ja-V2-GPTQ-Ja-En | 27.6 | 18.28 | 0.8643 | 52.1 | 24.58 | 0.6106 |
|
@@ -111,7 +111,6 @@ Here are the results of a comparison of various genres of writing with the actua
|
|
111 |
[Sample Code For Free Colab](https://github.com/webbigdata-jp/python_sample/blob/main/ALMA_7B_Ja_V2_Free_Colab_sample.ipynb)
|
112 |
|
113 |
|
114 |
-
|
115 |
## Other Version
|
116 |
|
117 |
### ALMA-7B-Ja-V2-GPTQ-Ja-En
|
|
|
10 |
---
|
11 |
# webbigdata/ALMA-7B-Ja-V2
|
12 |
|
13 |
+
ALMA-7B-Ja-V2は日本語から英語、英語から日本語の翻訳が可能な機械翻訳モデルです。
|
14 |
The ALMA-7B-Ja-V2 is a machine translation model capable of translating from Japanese to English and English to Japanese.
|
15 |
|
16 |
ALMA-7B-Ja-V2は以前のモデル(ALMA-7B-Ja)に更に学習を追加し、性能を向上しています。
|
17 |
The ALMA-7B-Ja-V2 adds further learning to the previous model (ALMA-7B-Ja) and improves performance.
|
18 |
|
19 |
+
日本語と英語間に加えて、このモデルは以下の言語間の翻訳能力も持っていますが、日英、英日翻訳を主目的にしています。
|
20 |
+
In addition to translation between Japanese and English, this model also has the ability to translate between the following languages, but is primarily intended for Japanese-English and English-Japanese translation.
|
21 |
|
22 |
- ドイツ語 German(de) and 英語 English(en)
|
23 |
- 中国語 Chinese(zh) and 英語 English(en)
|
|
|
30 |
The following three metrics were used to check translation performance.
|
31 |
|
32 |
数字は多いほど性能が良い事を表します。
|
33 |
+
The higher the number, the better the performance.
|
34 |
|
35 |
## BLEU
|
36 |
+
翻訳テキストが元のテキストにどれだけ似ているかを評価する指標。しかし、単語の出現頻度だけを見ているため、語順の正確さや文の流暢さを十分に評価できないという弱点があります
|
37 |
+
A metric that evaluates how similar the translated text is to the original text. However, since it mainly looks at the frequency of word appearances, it may not effectively evaluate the accuracy of word order or the fluency of sentences.
|
38 |
|
39 |
### chrF++
|
40 |
+
文字の組み合わせの一致度と語順に基づいて、翻訳の正確さを評価する方法。弱点としては、長い文の評価には不向きであることが挙げられます。
|
41 |
+
A method to evaluate translation accuracy based on how well character combinations match and the order of words. A drawback is that it might not be suitable for evaluating longer sentences.
|
42 |
|
43 |
### comet
|
44 |
+
機械学習モデルを使って翻訳の品質を自動的に評価するためのツール。機械学習ベースであるため、元々のモデルが学習に使ったデータに大きく依存するという弱点があります。
|
45 |
+
A tool where a computer automatically assesses the quality of a translation. Being machine learning-based, it has the drawback of being heavily dependent on the training data it was provided.
|
46 |
|
47 |
## vs. NLLB-200
|
48 |
Meta社の200言語以上の翻訳に対応した超多言語対応機械翻訳モデルNLLB-200シリーズと比較したベンチマーク結果は以下です。
|
|
|
56 |
| NLLB-200 | 17.58GB | 25.2/- | - | 55.1/- | - |
|
57 |
| NLLB-200 | 220.18GB | 27.9/33.2 | 0.8908 | 55.8/59.8 | 0.8792 |
|
58 |
|
59 |
+
## previous our model(ALMA-7B-Ja)
|
60 |
| Model Name | file size |E->J chrf++/F2|E->J comet|J->E chrf++/F2|J->E comet |
|
61 |
|------------------------------|-----------|--------------|----------|--------------|-----------|
|
62 |
| webbigdata-ALMA-7B-Ja-q4_K_S | 3.6GB | -/24.2 | 0.8210 | -/54.2 | 0.8559 |
|
63 |
| ALMA-7B-Ja-GPTQ-Ja-En | 3.9GB | -/30.8 | 0.8743 | -/60.9 | 0.8743 |
|
64 |
| ALMA-Ja(Ours) | 13.48GB | -/31.8 | 0.8811 | -/61.6 | 0.8773 |
|
65 |
|
66 |
+
## ALMA-7B-Ja-V2
|
67 |
| Model Name | file size |E->J chrf++/F2|E->J comet|J->E chrf++/F2|J->E comet |
|
68 |
|------------------------------|-----------|--------------|----------|--------------|-----------|
|
69 |
| ALMA-7B-Ja-V2-GPTQ-Ja-En | 3.9GB | -/33.0 | 0.8818 | -/62.0 | 0.8774 |
|
|
|
74 |
ALMA-7B-Ja-V2を様々なジャンルの文章を現実世界のアプリケーションと比較した結果は以下です。
|
75 |
Here are the results of a comparison of various genres of writing with the actual application.
|
76 |
|
77 |
+
## 政府の公式文章 Government Official Announcements
|
78 |
| |e->j chrF2++|e->j BLEU|e->j comet|j->e chrF2++|j->e BLEU|j->e comet|
|
79 |
|--------------------------|------------|---------|----------|------------|---------|----------|
|
80 |
| ALMA-7B-Ja-V2-GPTQ-Ja-En | 25.3 | 15.00 | 0.8848 | 60.3 | 26.82 | 0.6189 |
|
|
|
85 |
| google-translate | 43.5 | 35.37 | 0.9181 | 62.7 | 29.22 | 0.6446 |
|
86 |
| deepl | 43.5 | 35.74 | 0.9301 | 60.1 | 27.40 | 0.6389 |
|
87 |
|
88 |
+
## 古典文学 Classical Literature
|
89 |
| |e->j chrF2++|e->j BLEU|e->j comet|j->e chrF2++|j->e BLEU|j->e comet|
|
90 |
|--------------------------|------------|---------|----------|------------|---------|----------|
|
91 |
| ALMA-7B-Ja-V2-GPTQ-Ja-En | 11.8 | 7.24 | 0.6943 | 31.9 | 9.71 | 0.5617 |
|
|
|
96 |
| deepl | 14.4 | 9.18 | 0.7149 | 34.6 | 10.68 | 0.5787 |
|
97 |
| google-translate | 13.5 | 8.57 | 0.7432 | 31.7 | 7.94 | 0.5856 |
|
98 |
|
99 |
+
## 二次創作 Fanfiction
|
100 |
| |e->j chrF2++|e->j BLEU|e->j comet|j->e chrF2++|j->e BLEU|j->e comet|
|
101 |
|--------------------------|------------|---------|----------|------------|---------|----------|
|
102 |
| ALMA-7B-Ja-V2-GPTQ-Ja-En | 27.6 | 18.28 | 0.8643 | 52.1 | 24.58 | 0.6106 |
|
|
|
111 |
[Sample Code For Free Colab](https://github.com/webbigdata-jp/python_sample/blob/main/ALMA_7B_Ja_V2_Free_Colab_sample.ipynb)
|
112 |
|
113 |
|
|
|
114 |
## Other Version
|
115 |
|
116 |
### ALMA-7B-Ja-V2-GPTQ-Ja-En
|