ai-forever
commited on
Commit
•
ab7cd5a
1
Parent(s):
c724bd3
Update README.md
Browse files
README.md
CHANGED
@@ -23,15 +23,15 @@ model-index:
|
|
23 |
metrics:
|
24 |
- name: Precision
|
25 |
type: precision
|
26 |
-
value:
|
27 |
verified: false
|
28 |
- name: Recall
|
29 |
type: recall
|
30 |
-
value:
|
31 |
verified: false
|
32 |
- name: F1
|
33 |
type: f1
|
34 |
-
value:
|
35 |
verified: false
|
36 |
- task:
|
37 |
type: text-generation
|
@@ -41,15 +41,15 @@ model-index:
|
|
41 |
metrics:
|
42 |
- name: Precision
|
43 |
type: precision
|
44 |
-
value:
|
45 |
verified: false
|
46 |
- name: Recall
|
47 |
type: recall
|
48 |
-
value:
|
49 |
verified: false
|
50 |
- name: F1
|
51 |
type: f1
|
52 |
-
value:
|
53 |
verified: false
|
54 |
- task:
|
55 |
type: text-generation
|
@@ -59,15 +59,15 @@ model-index:
|
|
59 |
metrics:
|
60 |
- name: Precision
|
61 |
type: precision
|
62 |
-
value:
|
63 |
verified: false
|
64 |
- name: Recall
|
65 |
type: recall
|
66 |
-
value:
|
67 |
verified: false
|
68 |
- name: F1
|
69 |
type: f1
|
70 |
-
value:
|
71 |
verified: false
|
72 |
- task:
|
73 |
type: text-generation
|
@@ -77,15 +77,15 @@ model-index:
|
|
77 |
metrics:
|
78 |
- name: Precision
|
79 |
type: precision
|
80 |
-
value:
|
81 |
verified: false
|
82 |
- name: Recall
|
83 |
type: recall
|
84 |
-
value:
|
85 |
verified: false
|
86 |
- name: F1
|
87 |
type: f1
|
88 |
-
value:
|
89 |
verified: false
|
90 |
- task:
|
91 |
type: text-generation
|
@@ -131,7 +131,7 @@ model-index:
|
|
131 |
## Summary
|
132 |
|
133 |
The model corrects spelling errors and typos in both Russian and English languages by bringing all the words in the text to the norm of the language.
|
134 |
-
Corrector had been trained based on the model [
|
135 |
An extensive dataset with “artificial” errors was taken as a training corpus: the corpus was assembled on the basis of the Russian-language Wikipedia and transcripts of Russian-language videos, then typos and spelling errors were automatically introduced into it using the library [SAGE](https://github.com/ai-forever/sage).
|
136 |
|
137 |
## Public references
|
@@ -164,7 +164,8 @@ RUSpellRU, MultidomainGold, MedSpellChecker, GitHubTypoCorpusRu are datasets for
|
|
164 |
**RUSpellRU**
|
165 |
| Model | Precision | Recall | F1 |
|
166 |
| --- | --- | --- | --- |
|
167 |
-
| sage-mt5-large |
|
|
|
168 |
| sage-ai-service | 93.5 | 82.4 | 87.6 |
|
169 |
| gpt-3.5-turbo | 39.6 | 62.3 | 48.5 |
|
170 |
| gpt-4 | 69.5 | 81.0 | 74.8 |
|
@@ -172,7 +173,8 @@ RUSpellRU, MultidomainGold, MedSpellChecker, GitHubTypoCorpusRu are datasets for
|
|
172 |
**MultidomainGold**
|
173 |
| Model | Precision | Recall | F1 |
|
174 |
| --- | --- | --- | --- |
|
175 |
-
| sage-mt5-large |
|
|
|
176 |
| sage-ai-service | 70.9 | 68.8 | 69.9 |
|
177 |
| gpt-3.5-turbo | 17.8 | 56.1 | 27.0 |
|
178 |
| gpt-4 | 31.1 | 78.1 | 44.5 |
|
@@ -180,20 +182,39 @@ RUSpellRU, MultidomainGold, MedSpellChecker, GitHubTypoCorpusRu are datasets for
|
|
180 |
**MedSpellChecker**
|
181 |
| Model | Precision | Recall | F1 |
|
182 |
| --- | --- | --- | --- |
|
183 |
-
| sage-mt5-large |
|
|
|
184 |
| sage-ai-service | 73.4 | 76.2 | 74.9 |
|
185 |
| gpt-3.5-turbo | 15.1 | 53.6 | 23.5 |
|
186 |
| gpt-4 | 48.9 | 88.7 | 63.1 |
|
187 |
|
188 |
-
|
189 |
**GitHubTypoCorpusRu**
|
190 |
| Model | Precision | Recall | F1 |
|
191 |
| --- | --- | --- | --- |
|
192 |
-
| sage-mt5-large |
|
|
|
193 |
| sage-ai-service | 76.1 | 51.2 | 61.2 |
|
194 |
| gpt-3.5-turbo | 23.7 | 43.9 | 30.8 |
|
195 |
| gpt-4 | 34.7 | 60.5 | 44.1|
|
196 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
197 |
|
198 |
## How to use
|
199 |
```python
|
|
|
23 |
metrics:
|
24 |
- name: Precision
|
25 |
type: precision
|
26 |
+
value: 56.2
|
27 |
verified: false
|
28 |
- name: Recall
|
29 |
type: recall
|
30 |
+
value: 65.8
|
31 |
verified: false
|
32 |
- name: F1
|
33 |
type: f1
|
34 |
+
value: 60.6
|
35 |
verified: false
|
36 |
- task:
|
37 |
type: text-generation
|
|
|
41 |
metrics:
|
42 |
- name: Precision
|
43 |
type: precision
|
44 |
+
value: 42.1
|
45 |
verified: false
|
46 |
- name: Recall
|
47 |
type: recall
|
48 |
+
value: 47.5
|
49 |
verified: false
|
50 |
- name: F1
|
51 |
type: f1
|
52 |
+
value: 44.6
|
53 |
verified: false
|
54 |
- task:
|
55 |
type: text-generation
|
|
|
59 |
metrics:
|
60 |
- name: Precision
|
61 |
type: precision
|
62 |
+
value: 38.6
|
63 |
verified: false
|
64 |
- name: Recall
|
65 |
type: recall
|
66 |
+
value: 56.0
|
67 |
verified: false
|
68 |
- name: F1
|
69 |
type: f1
|
70 |
+
value: 45.7
|
71 |
verified: false
|
72 |
- task:
|
73 |
type: text-generation
|
|
|
77 |
metrics:
|
78 |
- name: Precision
|
79 |
type: precision
|
80 |
+
value: 52.8
|
81 |
verified: false
|
82 |
- name: Recall
|
83 |
type: recall
|
84 |
+
value: 49.8
|
85 |
verified: false
|
86 |
- name: F1
|
87 |
type: f1
|
88 |
+
value: 51.2
|
89 |
verified: false
|
90 |
- task:
|
91 |
type: text-generation
|
|
|
131 |
## Summary
|
132 |
|
133 |
The model corrects spelling errors and typos in both Russian and English languages by bringing all the words in the text to the norm of the language.
|
134 |
+
Corrector had been trained based on the model [mT5-large](https://huggingface.co/google/mt5-large) architecture.
|
135 |
An extensive dataset with “artificial” errors was taken as a training corpus: the corpus was assembled on the basis of the Russian-language Wikipedia and transcripts of Russian-language videos, then typos and spelling errors were automatically introduced into it using the library [SAGE](https://github.com/ai-forever/sage).
|
136 |
|
137 |
## Public references
|
|
|
164 |
**RUSpellRU**
|
165 |
| Model | Precision | Recall | F1 |
|
166 |
| --- | --- | --- | --- |
|
167 |
+
| sage-mt5-large | 56.2 | 65.8 | 60.6 |
|
168 |
+
| sage-mt5-large (ft.) | 88.4 | 71.6 | 79.1 |
|
169 |
| sage-ai-service | 93.5 | 82.4 | 87.6 |
|
170 |
| gpt-3.5-turbo | 39.6 | 62.3 | 48.5 |
|
171 |
| gpt-4 | 69.5 | 81.0 | 74.8 |
|
|
|
173 |
**MultidomainGold**
|
174 |
| Model | Precision | Recall | F1 |
|
175 |
| --- | --- | --- | --- |
|
176 |
+
| sage-mt5-large | 42.1 | 47.5 | 44.6 |
|
177 |
+
| sage-mt5-large (ft.) | 65.3 | 62.7 | 63.9 |
|
178 |
| sage-ai-service | 70.9 | 68.8 | 69.9 |
|
179 |
| gpt-3.5-turbo | 17.8 | 56.1 | 27.0 |
|
180 |
| gpt-4 | 31.1 | 78.1 | 44.5 |
|
|
|
182 |
**MedSpellChecker**
|
183 |
| Model | Precision | Recall | F1 |
|
184 |
| --- | --- | --- | --- |
|
185 |
+
| sage-mt5-large | 38.6 | 56.0 | 45.7 |
|
186 |
+
| sage-mt5-large (ft.) | 77.7 | 77.5 | 77.6 |
|
187 |
| sage-ai-service | 73.4 | 76.2 | 74.9 |
|
188 |
| gpt-3.5-turbo | 15.1 | 53.6 | 23.5 |
|
189 |
| gpt-4 | 48.9 | 88.7 | 63.1 |
|
190 |
|
|
|
191 |
**GitHubTypoCorpusRu**
|
192 |
| Model | Precision | Recall | F1 |
|
193 |
| --- | --- | --- | --- |
|
194 |
+
| sage-mt5-large | 52.8 | 49.8 | 51.2 |
|
195 |
+
| sage-mt5-large (ft.) | 69.5 | 46.0 | 55.3 |
|
196 |
| sage-ai-service | 76.1 | 51.2 | 61.2 |
|
197 |
| gpt-3.5-turbo | 23.7 | 43.9 | 30.8 |
|
198 |
| gpt-4 | 34.7 | 60.5 | 44.1|
|
199 |
|
200 |
+
**BEA60K**
|
201 |
+
| Model | Precision | Recall | F1 |
|
202 |
+
| --- | --- | --- | --- |
|
203 |
+
| sage-mt5-large | 64.7 | 83.8 | 73.0 |
|
204 |
+
| gpt-3.5-turbo | 66.9 | 84.1 | 74.5 |
|
205 |
+
| gpt-4 | 68.6 | 85.2 | 76.0 |
|
206 |
+
| Bert (https://github.com/neuspell/neuspell) | 65.8 | 79.6 | 72.0 |
|
207 |
+
| SC-LSTM (https://github.com/neuspell/neuspell) | 62.2 | 80.3 | 72.0 |
|
208 |
+
|
209 |
+
**JFLEG**
|
210 |
+
| Model | Precision | Recall | F1 |
|
211 |
+
| --- | --- | --- | --- |
|
212 |
+
| sage-mt5-large | 74.9 | 88.4 | 81.1 |
|
213 |
+
| gpt-3.5-turbo | 77.8 | 88.6 | 82.9 |
|
214 |
+
| gpt-4 | 77.9 | 88.3 | 82.8 |
|
215 |
+
| Bert (https://github.com/neuspell/neuspell) | 78.5 | 85.4 | 81.8 |
|
216 |
+
| SC-LSTM (https://github.com/neuspell/neuspell) | 80.6 | 86.1 | 83.2 |
|
217 |
+
|
218 |
|
219 |
## How to use
|
220 |
```python
|