Update README.md
Browse files
README.md
CHANGED
@@ -18,7 +18,7 @@ For translating between the two languages, there are not any working off-the-she
|
|
18 |
| | |
|
19 |
|---|---|
|
20 |
| Widget | Try the widget in the top right corner |
|
21 |
-
| Huggingface Spaces | Go to [mt5](https://huggingface.co/google/mt5-base) |
|
22 |
| | |
|
23 |
## Pretraining a T5-base
|
24 |
There is an [mt5](https://huggingface.co/google/mt5-base) that includes Norwegian. Unfortunately a very small part of this is Nynorsk; there is only around 1GB Nynorsk text in mC4. Despite this, the mt5 also gives a BLEU score above 80. During the project we extracted all available Nynorsk text from the [Norwegian Colossal Corpus](https://github.com/NBAiLab/notram/blob/master/guides/corpus_v2_summary.md) at the National Library of Norway, and matched it (by material type i.e. book, newspapers and so on) with an equal amount of Bokmål. The corpus collection is described [here](https://github.com/NBAiLab/notram/blob/master/guides/nb_nn_balanced_corpus.md) and the total size is 19GB.
|
|
|
18 |
| | |
|
19 |
|---|---|
|
20 |
| Widget | Try the widget in the top right corner |
|
21 |
+
| Huggingface Spaces | Go to [mt5](https://huggingface.co/google/mt5-base) |
|
22 |
| | |
|
23 |
## Pretraining a T5-base
|
24 |
There is an [mt5](https://huggingface.co/google/mt5-base) that includes Norwegian. Unfortunately a very small part of this is Nynorsk; there is only around 1GB Nynorsk text in mC4. Despite this, the mt5 also gives a BLEU score above 80. During the project we extracted all available Nynorsk text from the [Norwegian Colossal Corpus](https://github.com/NBAiLab/notram/blob/master/guides/corpus_v2_summary.md) at the National Library of Norway, and matched it (by material type i.e. book, newspapers and so on) with an equal amount of Bokmål. The corpus collection is described [here](https://github.com/NBAiLab/notram/blob/master/guides/nb_nn_balanced_corpus.md) and the total size is 19GB.
|