Edit model card

opus-mt-tc-bible-big-gmw-deu_eng_fra_por_spa

Table of Contents

Model Details

Neural machine translation model for translating from West Germanic languages (gmw) to unknown (deu+eng+fra+por+spa).

This model is part of the OPUS-MT project, an effort to make neural machine translation models widely available and accessible for many languages in the world. All models are originally trained using the amazing framework of Marian NMT, an efficient NMT implementation written in pure C++. The models have been converted to pyTorch using the transformers library by huggingface. Training data is taken from OPUS and training pipelines use the procedures of OPUS-MT-train. Model Description:

This is a multilingual translation model with multiple target languages. A sentence initial language token is required in the form of >>id<< (id = valid target language ID), e.g. >>deu<<

Uses

This model can be used for translation and text-to-text generation.

Risks, Limitations and Biases

CONTENT WARNING: Readers should be aware that the model is trained on various public data sets that may contain content that is disturbing, offensive, and can propagate historical and current stereotypes.

Significant research has explored bias and fairness issues with language models (see, e.g., Sheng et al. (2021) and Bender et al. (2021)).

How to Get Started With the Model

A short example code:

from transformers import MarianMTModel, MarianTokenizer

src_text = [
    ">>deu<< Replace this with text in an accepted source language.",
    ">>spa<< This is the second sentence."
]

model_name = "pytorch-models/opus-mt-tc-bible-big-gmw-deu_eng_fra_por_spa"
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name)
translated = model.generate(**tokenizer(src_text, return_tensors="pt", padding=True))

for t in translated:
    print( tokenizer.decode(t, skip_special_tokens=True) )

You can also use OPUS-MT models with the transformers pipelines, for example:

from transformers import pipeline
pipe = pipeline("translation", model="Helsinki-NLP/opus-mt-tc-bible-big-gmw-deu_eng_fra_por_spa")
print(pipe(">>deu<< Replace this with text in an accepted source language."))

Training

Evaluation

langpair testset chr-F BLEU #sent #words
afr-deu tatoeba-test-v2021-08-07 0.68492 48.8 1583 9105
afr-eng tatoeba-test-v2021-08-07 0.72943 59.6 1374 9622
afr-spa tatoeba-test-v2021-08-07 0.72793 58.4 448 2783
deu-deu tatoeba-test-v2021-08-07 0.59840 34.8 2500 20806
deu-eng tatoeba-test-v2021-08-07 0.65957 48.5 17565 149462
deu-fra tatoeba-test-v2021-08-07 0.68054 50.2 12418 102721
deu-por tatoeba-test-v2021-08-07 0.63368 42.8 10000 81482
deu-spa tatoeba-test-v2021-08-07 0.68198 49.3 10521 82570
eng-deu tatoeba-test-v2021-08-07 0.62497 40.5 17565 151568
eng-eng tatoeba-test-v2021-08-07 0.79878 57.3 12062 115106
eng-fra tatoeba-test-v2021-08-07 0.68567 50.9 12681 106378
eng-por tatoeba-test-v2021-08-07 0.72204 53.4 13222 105265
eng-spa tatoeba-test-v2021-08-07 0.72539 55.3 16583 134710
fry-eng tatoeba-test-v2021-08-07 0.55137 37.0 220 1573
gos-deu tatoeba-test-v2021-08-07 0.46120 24.7 207 1168
gos-eng tatoeba-test-v2021-08-07 0.38628 22.3 1154 5635
gsw-eng tatoeba-test-v2021-08-07 0.43003 27.5 205 990
ltz-deu tatoeba-test-v2021-08-07 0.48474 32.0 347 2208
ltz-eng tatoeba-test-v2021-08-07 0.65366 56.4 293 1840
nds-deu tatoeba-test-v2021-08-07 0.65251 45.9 9999 74564
nds-eng tatoeba-test-v2021-08-07 0.61858 44.7 2500 17589
nds-fra tatoeba-test-v2021-08-07 0.60412 43.9 857 5676
nds-por tatoeba-test-v2021-08-07 0.58778 39.5 207 1256
nds-spa tatoeba-test-v2021-08-07 0.63404 43.9 923 5540
nld-deu tatoeba-test-v2021-08-07 0.72998 55.7 10218 74131
nld-eng tatoeba-test-v2021-08-07 0.74362 60.9 12696 89978
nld-fra tatoeba-test-v2021-08-07 0.68461 48.0 11548 82974
nld-por tatoeba-test-v2021-08-07 0.68798 49.3 2500 17326
nld-spa tatoeba-test-v2021-08-07 0.69971 51.6 10113 74981
yid-eng tatoeba-test-v2021-08-07 0.49807 31.5 2483 15452
yid-fra tatoeba-test-v2021-08-07 0.54147 31.9 384 2455
afr-deu flores101-devtest 0.57831 28.4 1012 25094
afr-eng flores101-devtest 0.74272 53.8 1012 24721
afr-fra flores101-devtest 0.61936 36.4 1012 28343
afr-por flores101-devtest 0.61309 35.4 1012 26519
afr-spa flores101-devtest 0.51393 22.9 1012 29199
deu-spa flores101-devtest 0.52438 23.9 1012 29199
eng-deu flores101-devtest 0.64236 37.2 1012 25094
eng-spa flores101-devtest 0.55524 27.1 1012 29199
nld-deu flores101-devtest 0.53435 22.1 1012 25094
nld-eng flores101-devtest 0.58686 30.0 1012 24721
nld-fra flores101-devtest 0.56292 28.2 1012 28343
afr-deu flores200-devtest 0.58456 29.5 1012 25094
afr-eng flores200-devtest 0.74857 54.7 1012 24721
afr-fra flores200-devtest 0.62537 37.2 1012 28343
afr-por flores200-devtest 0.61751 36.1 1012 26519
afr-spa flores200-devtest 0.51647 23.2 1012 29199
deu-eng flores200-devtest 0.67103 41.4 1012 24721
deu-fra flores200-devtest 0.62658 36.8 1012 28343
deu-por flores200-devtest 0.60909 34.8 1012 26519
deu-spa flores200-devtest 0.52584 24.2 1012 29199
eng-deu flores200-devtest 0.64560 37.5 1012 25094
eng-fra flores200-devtest 0.70736 49.1 1012 28343
eng-por flores200-devtest 0.71065 49.5 1012 26519
eng-spa flores200-devtest 0.55738 27.4 1012 29199
lim-deu flores200-devtest 0.45062 16.1 1012 25094
lim-eng flores200-devtest 0.48217 21.8 1012 24721
lim-fra flores200-devtest 0.44347 18.5 1012 28343
lim-por flores200-devtest 0.42527 16.8 1012 26519
ltz-deu flores200-devtest 0.60114 31.3 1012 25094
ltz-eng flores200-devtest 0.64345 39.3 1012 24721
ltz-fra flores200-devtest 0.59368 33.8 1012 28343
ltz-por flores200-devtest 0.51545 24.8 1012 26519
ltz-spa flores200-devtest 0.44821 17.5 1012 29199
nld-deu flores200-devtest 0.53650 22.4 1012 25094
nld-eng flores200-devtest 0.59102 30.6 1012 24721
nld-fra flores200-devtest 0.56608 28.7 1012 28343
nld-por flores200-devtest 0.54728 26.7 1012 26519
nld-spa flores200-devtest 0.49175 20.1 1012 29199
tpi-deu flores200-devtest 0.40350 10.9 1012 25094
tpi-eng flores200-devtest 0.48289 19.6 1012 24721
tpi-fra flores200-devtest 0.43428 16.1 1012 28343
tpi-por flores200-devtest 0.42966 15.4 1012 26519
deu-eng generaltest2022 0.56042 31.0 1984 37634
deu-fra generaltest2022 0.61145 37.6 1984 38276
eng-deu generaltest2022 0.60090 32.5 2037 38914
deu-eng multi30k_test_2016_flickr 0.60974 40.1 1000 12955
deu-fra multi30k_test_2016_flickr 0.62493 38.8 1000 13505
eng-deu multi30k_test_2016_flickr 0.64164 35.3 1000 12106
eng-fra multi30k_test_2016_flickr 0.71137 50.7 1000 13505
deu-eng multi30k_test_2017_flickr 0.63118 40.6 1000 11374
deu-fra multi30k_test_2017_flickr 0.62614 37.0 1000 12118
eng-deu multi30k_test_2017_flickr 0.62518 33.4 1000 10755
eng-fra multi30k_test_2017_flickr 0.71402 50.3 1000 12118
deu-eng multi30k_test_2017_mscoco 0.55495 32.1 461 5231
deu-fra multi30k_test_2017_mscoco 0.59307 34.7 461 5484
eng-deu multi30k_test_2017_mscoco 0.58028 29.7 461 5158
eng-fra multi30k_test_2017_mscoco 0.73637 54.7 461 5484
deu-eng multi30k_test_2018_flickr 0.59367 36.7 1071 14689
deu-fra multi30k_test_2018_flickr 0.57388 31.3 1071 15867
eng-deu multi30k_test_2018_flickr 0.59998 30.8 1071 13703
eng-fra multi30k_test_2018_flickr 0.65354 41.6 1071 15867
eng-fra newsdiscusstest2015 0.63308 37.7 1500 27975
deu-eng newssyscomb2009 0.55170 28.3 502 11818
deu-fra newssyscomb2009 0.56021 27.4 502 12331
deu-spa newssyscomb2009 0.55546 28.1 502 12503
eng-deu newssyscomb2009 0.53919 23.0 502 11271
eng-fra newssyscomb2009 0.58384 29.5 502 12331
eng-spa newssyscomb2009 0.58266 31.0 502 12503
deu-eng newstest2008 0.54434 27.0 2051 49380
deu-fra newstest2008 0.55076 26.2 2051 52685
deu-spa newstest2008 0.54056 25.6 2051 52586
eng-deu newstest2008 0.52906 23.0 2051 47447
eng-fra newstest2008 0.55247 26.8 2051 52685
eng-spa newstest2008 0.56423 29.6 2051 52586
deu-eng newstest2009 0.53972 26.7 2525 65399
deu-fra newstest2009 0.53975 25.6 2525 69263
deu-spa newstest2009 0.53677 25.6 2525 68111
eng-deu newstest2009 0.53097 22.1 2525 62816
eng-fra newstest2009 0.57542 29.1 2525 69263
eng-spa newstest2009 0.57733 29.8 2525 68111
deu-eng newstest2010 0.58278 30.2 2489 61711
deu-fra newstest2010 0.57876 29.0 2489 66022
deu-spa newstest2010 0.59402 32.6 2489 65480
eng-deu newstest2010 0.54587 25.3 2489 61503
eng-fra newstest2010 0.59460 32.0 2489 66022
eng-spa newstest2010 0.61861 36.3 2489 65480
deu-eng newstest2011 0.55074 26.8 3003 74681
deu-fra newstest2011 0.55879 27.4 3003 80626
deu-spa newstest2011 0.56593 30.2 3003 79476
eng-deu newstest2011 0.52619 22.7 3003 72981
eng-fra newstest2011 0.60960 34.1 3003 80626
eng-spa newstest2011 0.62056 38.5 3003 79476
deu-eng newstest2012 0.56290 28.4 3003 72812
deu-fra newstest2012 0.55931 27.3 3003 78011
deu-spa newstest2012 0.57369 31.5 3003 79006
eng-deu newstest2012 0.52668 23.3 3003 72886
eng-fra newstest2012 0.59076 31.6 3003 78011
eng-spa newstest2012 0.62361 38.8 3003 79006
deu-eng newstest2013 0.58065 31.8 3000 64505
deu-fra newstest2013 0.56431 30.0 3000 70037
deu-spa newstest2013 0.56965 31.5 3000 70528
eng-deu newstest2013 0.55423 26.9 3000 63737
eng-fra newstest2013 0.58760 33.1 3000 70037
eng-spa newstest2013 0.59825 35.1 3000 70528
deu-eng newstest2014 0.59617 32.9 3003 67337
eng-deu newstest2014 0.58847 28.0 3003 62688
eng-fra newstest2014 0.65294 39.9 3003 77306
deu-eng newstest2015 0.59741 33.8 2169 46443
eng-deu newstest2015 0.59474 31.0 2169 44260
deu-eng newstest2016 0.64981 40.6 2999 64119
eng-deu newstest2016 0.63839 37.1 2999 62669
deu-eng newstest2017 0.60957 35.5 3004 64399
eng-deu newstest2017 0.58967 30.0 3004 61287
deu-eng newstest2018 0.66739 43.4 2998 67012
eng-deu newstest2018 0.68858 44.9 2998 64276
deu-eng newstest2019 0.63671 39.6 2000 39227
deu-fra newstest2019 0.63043 36.1 1701 42509
eng-deu newstest2019 0.65934 41.4 1997 48746
deu-eng newstest2020 0.60800 34.5 785 38220
deu-fra newstest2020 0.60544 33.1 1619 36890
eng-deu newstest2020 0.60078 31.7 1418 52383
deu-eng newstest2021 0.60048 31.9 1000 20180
deu-fra newstest2021 0.59590 31.8 1000 23757
eng-deu newstest2021 0.56133 25.6 1002 27970
deu-eng newstestALL2020 0.60800 34.5 785 38220
eng-deu newstestALL2020 0.60078 31.7 1418 52383
deu-eng newstestB2020 0.60795 34.4 785 37696
eng-deu newstestB2020 0.59256 31.5 1418 53092
afr-deu ntrex128 0.55289 25.8 1997 48761
afr-eng ntrex128 0.72558 51.8 1997 47673
afr-fra ntrex128 0.56601 29.3 1997 53481
afr-por ntrex128 0.55396 28.1 1997 51631
afr-spa ntrex128 0.58558 33.7 1997 54107
deu-eng ntrex128 0.61722 33.8 1997 47673
deu-fra ntrex128 0.55908 28.6 1997 53481
deu-por ntrex128 0.54059 25.7 1997 51631
deu-spa ntrex128 0.56887 30.8 1997 54107
eng-deu ntrex128 0.58492 29.8 1997 48761
eng-fra ntrex128 0.61349 35.2 1997 53481
eng-por ntrex128 0.59785 33.4 1997 51631
eng-spa ntrex128 0.63935 40.1 1997 54107
ltz-deu ntrex128 0.51469 21.9 1997 48761
ltz-eng ntrex128 0.58627 32.4 1997 47673
ltz-fra ntrex128 0.50491 22.8 1997 53481
ltz-por ntrex128 0.45364 18.7 1997 51631
ltz-spa ntrex128 0.47568 21.6 1997 54107
nld-deu ntrex128 0.55943 25.7 1997 48761
nld-eng ntrex128 0.63470 36.1 1997 47673
nld-fra ntrex128 0.55832 27.5 1997 53481
nld-por ntrex128 0.54714 27.3 1997 51631
nld-spa ntrex128 0.57692 32.1 1997 54107
eng-fra tico19-test 0.62559 39.5 2100 64661
eng-por tico19-test 0.72765 49.8 2100 62729
eng-spa tico19-test 0.72905 51.6 2100 66563

Citation Information

@article{tiedemann2023democratizing,
  title={Democratizing neural machine translation with {OPUS-MT}},
  author={Tiedemann, J{\"o}rg and Aulamo, Mikko and Bakshandaeva, Daria and Boggia, Michele and Gr{\"o}nroos, Stig-Arne and Nieminen, Tommi and Raganato, Alessandro and Scherrer, Yves and Vazquez, Raul and Virpioja, Sami},
  journal={Language Resources and Evaluation},
  number={58},
  pages={713--755},
  year={2023},
  publisher={Springer Nature},
  issn={1574-0218},
  doi={10.1007/s10579-023-09704-w}
}

@inproceedings{tiedemann-thottingal-2020-opus,
    title = "{OPUS}-{MT} {--} Building open translation services for the World",
    author = {Tiedemann, J{\"o}rg  and Thottingal, Santhosh},
    booktitle = "Proceedings of the 22nd Annual Conference of the European Association for Machine Translation",
    month = nov,
    year = "2020",
    address = "Lisboa, Portugal",
    publisher = "European Association for Machine Translation",
    url = "https://aclanthology.org/2020.eamt-1.61",
    pages = "479--480",
}

@inproceedings{tiedemann-2020-tatoeba,
    title = "The Tatoeba Translation Challenge {--} Realistic Data Sets for Low Resource and Multilingual {MT}",
    author = {Tiedemann, J{\"o}rg},
    booktitle = "Proceedings of the Fifth Conference on Machine Translation",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.wmt-1.139",
    pages = "1174--1182",
}

Acknowledgements

The work is supported by the HPLT project, funded by the European Union’s Horizon Europe research and innovation programme under grant agreement No 101070350. We are also grateful for the generous computational resources and IT infrastructure provided by CSC -- IT Center for Science, Finland, and the EuroHPC supercomputer LUMI.

Model conversion info

  • transformers version: 4.45.1
  • OPUS-MT git hash: 0882077
  • port time: Tue Oct 8 11:18:52 EEST 2024
  • port machine: LM0-400-22516.local
Downloads last month
3
Safetensors
Model size
226M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Collection including Helsinki-NLP/opus-mt-tc-bible-big-gmw-deu_eng_fra_por_spa

Evaluation results