File size: 3,291 Bytes
85de692 cc5f18a 77151a4 835c5f8 b22e365 fd9f536 4c50fa6 77151a4 b22e365 fd9f536 4c50fa6 fd9f536 85de692 77151a4 b22e365 cc5f18a 83ead02 408d074 6720e0a cc5f18a 18c9c45 ac6e17e 85de692 ffcbdc8 85de692 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 |
---
license: cc-by-2.5
language:
- lt
- en
datasets:
- scoris/en-lt-merged-data
---
# Overview

This is an Lithuanian-English translation model
Original model: [Helsinki-NLP/opus-mt-tc-big-lt-en](https://huggingface.co/Helsinki-NLP/opus-mt-tc-big-lt-en)
Fine-tuned on large merged data set: [scoris/en-lt-merged-data](https://huggingface.co/datasets/scoris/en-lt-merged-data) (5.4 million sentence pairs)
For English-Lithuanian translation check another model [scoris/scoris-mt-en-lt](https://huggingface.co/scoris/scoris-mt-en-lt)
Trained on 3 epochs.
Made by [Scoris](https://scoris.lt) team
# Evaluation:
| LT-EN| BLEU |
|-|------|
| scoris/scoris-mt-lt-en| 43.8 |
| Helsinki-NLP/opus-mt-tc-big-en-lt| 36.8 |
| Google Translate| 31.9 |
| Deepl| 36.1 |
_Evaluated on scoris/en-lt-merged-data validation set. Google and Deepl evaluated using a random sample of 1000 sentence pairs._
According to [Google](https://cloud.google.com/translate/automl/docs/evaluate) BLEU score interpretation is following:
| BLEU Score | Interpretation
|----------|---------|
| < 10 | Almost useless
| 10 - 19 | Hard to get the gist
| 20 - 29 | The gist is clear, but has significant grammatical errors
| 30 - 40 | Understandable to good translations
| **40 - 50** | **High quality translations**
| 50 - 60 | Very high quality, adequate, and fluent translations
| > 60 | Quality often better than human
# Usage
You can use the model in the following way:
```python
from transformers import MarianMTModel, MarianTokenizer
# Specify the model identifier on Hugging Face Model Hub
model_name = "scoris/scoris/scoris-mt-lt-en"
# Load the model and tokenizer from Hugging Face
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name)
src_text = [
"Kartą, senų senovėje, buvo viena mergaitė ir gyveno ji su savo mama mažoje jaukioje trobelėje prie miško. ",
"Mergaitę žmonės vadino Raudonkepuraite, nes ji dažnai dėvėdavo raudoną apsiaustėlį su kapišonu. ",
"Mergaitė mielai gobdavosi šiuo apsiaustėliu, nes jį buvo gavusi iš savo močiutės, kuri gyveno namelyje už miško ir labai mylėjo Raudonkepuraitę. ",
"Vieną dieną mama priruošė Raudonkepuraitei pilną krepšelį įvairiausių gėrybių.",
"Pridėjo obuoliukų, kriaušaičių, braškių, taip pat skanių pyragėlių, kuriuos pati buvo iškepusi, sūrio ir gabalėlį mėsos bei didelį išdabintą tortą."
]
# Tokenize the text and generate translations
translated = model.generate(**tokenizer(src_text, return_tensors="pt", padding=True))
# Print out the translations
for t in translated:
print(tokenizer.decode(t, skip_special_tokens=True))
#Once upon a time there was a girl, and she lived with her mother in a small cozy hut by the forest.
#The girl was called the Red cape because she often wore a red cape.
#The girl would gladly wear this coat, because she had it from her grandmother, who lived in a house outside the forest and loved Redcape very much.
#One day my mother prepared a basket full of all kinds of good things for the Red cape.
#He added apples, pears, strawberries, as well as delicious cakes that he had baked, cheese and a piece of meat, and a large cake.
``` |