ai-forever commited on
Commit
074b7ed
1 Parent(s): f3c6b2c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -3
README.md CHANGED
@@ -92,19 +92,19 @@ model-index:
92
 
93
  ![banner](images/sage_banner.jpg)
94
 
95
- ### Summary
96
 
97
  The model corrects spelling and punctuation errors and typos by bringing all the words in the text to the norm of the Russian language.
98
  Corrector had been trained based on the model [FRED-T5-large](https://huggingface.co/ai-forever/FRED-T5-large).
99
  An extensive dataset with “artificial” errors was taken as a training corpus: the corpus was assembled on the basis of the Russian-language Wikipedia and transcripts of Russian-language videos, then typos and spelling errors were automatically introduced into it using the library [SAGE](https://github.com/ai-forever/sage).
100
 
101
- ### Public references
102
  - [SAGE library announcement](https://youtu.be/yFfkV0Qjuu0), DataFest 2023
103
  - [Paper about synthetic error generation methods](https://www.dialog-21.ru/media/5914/martynovnplusetal056.pdf), Dialogue 2023
104
  - [SAGE EACL 2024 paper](https://aclanthology.org/2024.findings-eacl.10/)
105
 
106
 
107
- ### Examples
108
  | Input | Output |
109
  | --- | --- |
110
  | И не чсно прохожим в этот день непогожйи почему я веселый такйо | И не ясно прохожим в этот день непогожий, почему я веселый такой. |
@@ -187,6 +187,10 @@ print(res)
187
  # ["И не ясно прохожим в этот день непогожий, почему я веселый такой."]
188
  ```
189
 
 
 
 
 
190
  ## Resources
191
  - [SAGE library](https://github.com/ai-forever/sage), GitHub
192
  - [sage-fredt5-large](https://huggingface.co/ai-forever/sage-fredt5-large), HuggingFace
 
92
 
93
  ![banner](images/sage_banner.jpg)
94
 
95
+ ## Summary
96
 
97
  The model corrects spelling and punctuation errors and typos by bringing all the words in the text to the norm of the Russian language.
98
  Corrector had been trained based on the model [FRED-T5-large](https://huggingface.co/ai-forever/FRED-T5-large).
99
  An extensive dataset with “artificial” errors was taken as a training corpus: the corpus was assembled on the basis of the Russian-language Wikipedia and transcripts of Russian-language videos, then typos and spelling errors were automatically introduced into it using the library [SAGE](https://github.com/ai-forever/sage).
100
 
101
+ ## Public references
102
  - [SAGE library announcement](https://youtu.be/yFfkV0Qjuu0), DataFest 2023
103
  - [Paper about synthetic error generation methods](https://www.dialog-21.ru/media/5914/martynovnplusetal056.pdf), Dialogue 2023
104
  - [SAGE EACL 2024 paper](https://aclanthology.org/2024.findings-eacl.10/)
105
 
106
 
107
+ ## Examples
108
  | Input | Output |
109
  | --- | --- |
110
  | И не чсно прохожим в этот день непогожйи почему я веселый такйо | И не ясно прохожим в этот день непогожий, почему я веселый такой. |
 
187
  # ["И не ясно прохожим в этот день непогожий, почему я веселый такой."]
188
  ```
189
 
190
+ ## Limitations
191
+ - The model is indended to be fine-tuned on sets with natural errors. The realesed model is a pre-train and pre-train task is different from the typicall spellchecking in terms of density of the noise in a corpus
192
+ -
193
+
194
  ## Resources
195
  - [SAGE library](https://github.com/ai-forever/sage), GitHub
196
  - [sage-fredt5-large](https://huggingface.co/ai-forever/sage-fredt5-large), HuggingFace