JCarlos's picture
Update README.md
1a1c1a2
metadata
language:
  - es
  - qu
tags:
  - quechua
  - translation
  - spanish
license: apache-2.0
metrics:
  - bleu
  - sacrebleu
widget:
  - text: Dios ama a los hombres
  - text: A pesar de todo, soy feliz
  - text: ¿Qué harán allí?
  - text: Debes aprender a respetar

Spanish to Quechua translator

This model is a finetuned version of the t5-small.

Model description

t5-small-finetuned-spanish-to-quechua has trained for 46 epochs with 102 747 sentences, the validation was performed with 12 844 sentences and 12 843 sentences were used for the test.

Intended uses & limitations

A large part of the dataset has been extracted from biblical texts, which makes the model perform better with certain types of sentences.

How to use

You can import this model as follows:

>>> from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
>>> model_name = 'hackathon-pln-es/t5-small-finetuned-spanish-to-quechua'
>>> model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
>>> tokenizer = AutoTokenizer.from_pretrained(model_name)

To translate you can do:

>>> sentence = "Entonces dijo"
>>> input = tokenizer(sentence, return_tensors="pt")
>>> output = model.generate(input["input_ids"], max_length=40, num_beams=4, early_stopping=True)
>>> print('Original Sentence: {} \nTranslated sentence: {}'.format(sentence, tokenizer.decode(output[0])))

Limitations and bias

Actually this model only can translate to Quechua of Ayacucho.

Training data

For train this model we use Spanish to Quechua dataset

Evaluation results

We obtained the following metrics during the training process:

  • eval_bleu = 2.9691
  • eval_loss = 1.2064628601074219

Team members