pere
/

nb-nn-translation

@@ -8,7 +8,7 @@ datasets:
 widget:
 - text: "Skriv inn en tekst som du ønsker å oversette til en annen målform."
 ---
-# BLEU-SCORE 88.16 !!!
 # 🇳🇴 Bokmål ⇔ Nynorsk 🇳🇴
 Norwegian has two relatively similar written languages; Bokmål and Nynorsk. Historically Nynorsk is a written norm based on dialects curated by the linguist Ivar Aasen in the mid-to-late 1800s, whereas Bokmål is a gradual 'Norwegization' of written Danish.
@@ -18,16 +18,15 @@ For translating between the two languages, there are not any working off-the-she
 |   |   |
 |---|---|
 | Widget                                | Try the widget in the top right corner |
-| Huggingface Spaces                    | Go to [mt5](https://huggingface.co/google/mt5-base)                           |
-| Google Docs Add-on (waiting approval) | Watch Gif-demo                         |
 |   |   |
 ## Pretraining a T5-base
 There is an [mt5](https://huggingface.co/google/mt5-base) that includes Norwegian. Unfortunately a very small part of this is Nynorsk; there is only around 1GB Nynorsk text in mC4. Despite this, the mt5 also gives a BLEU score above 80. During the project we extracted all available Nynorsk text from the [Norwegian Colossal Corpus](https://github.com/NBAiLab/notram/blob/master/guides/corpus_v2_summary.md) at the National Library of Norway, and matched it (by material type i.e. book, newspapers and so on) with an equal amount of Bokmål. The corpus collection is described [here](https://github.com/NBAiLab/notram/blob/master/guides/nb_nn_balanced_corpus.md) and the total size is 19GB.
 ## Finetuning
-Training for [30] epochs with a learning rate of [7e-4], a batch size of [32] and a max source and target length of [512] fine tuning reached a BLEU score of [87.94] at training and a test score of [88.16] after training. Considering the similarity of the two languages a high score is expected, however a score above 60 is usually taken as a high score.
-![Add-on](bm2nn_demo.gif)
 ```python
 # Set up the pipeline
 from transformers import pipeline

 widget:
 - text: "Skriv inn en tekst som du ønsker å oversette til en annen målform."
 ---
+# RECORD BLEU-SCORE 88.16 !!!
 # 🇳🇴 Bokmål ⇔ Nynorsk 🇳🇴
 Norwegian has two relatively similar written languages; Bokmål and Nynorsk. Historically Nynorsk is a written norm based on dialects curated by the linguist Ivar Aasen in the mid-to-late 1800s, whereas Bokmål is a gradual 'Norwegization' of written Danish.
 |   |   |
 |---|---|
 | Widget                                | Try the widget in the top right corner |
+| Huggingface Spaces                    | Go to [mt5](https://huggingface.co/google/mt5-base)                           |                       |
 |   |   |
 ## Pretraining a T5-base
 There is an [mt5](https://huggingface.co/google/mt5-base) that includes Norwegian. Unfortunately a very small part of this is Nynorsk; there is only around 1GB Nynorsk text in mC4. Despite this, the mt5 also gives a BLEU score above 80. During the project we extracted all available Nynorsk text from the [Norwegian Colossal Corpus](https://github.com/NBAiLab/notram/blob/master/guides/corpus_v2_summary.md) at the National Library of Norway, and matched it (by material type i.e. book, newspapers and so on) with an equal amount of Bokmål. The corpus collection is described [here](https://github.com/NBAiLab/notram/blob/master/guides/nb_nn_balanced_corpus.md) and the total size is 19GB.
 ## Finetuning
+Training for [30] epochs with a learning rate of [7e-4], a batch size of [32] and a max source and target length of [512] fine tuning reached a SACREBLEU score of [87.94] at training and a test score of [**88.16**] after training.
+## How to use the model
 ```python
 # Set up the pipeline
 from transformers import pipeline