Edit model card
YAML Metadata Error: "datasets[1]" with value "ParaCrawl 99k" is not valid. If possible, use a dataset id from https://hf.co/datasets.
YAML Metadata Error: "datasets[5]" with value "Biomedical Domain Corpora" is not valid. If possible, use a dataset id from https://hf.co/datasets.

Introduction

This repository brings an implementation of T5 for translation in PT-EN tasks using a modest hardware setup. We propose some changes in tokenizator and post-processing that improves the result and used a Portuguese pretrained model for the translation. You can collect more informations in our repository. Also, check our paper!

Usage

Just follow "Use in Transformers" instructions. It is necessary to add a few words before to define the task to T5.

You can also create a pipeline for it. An example with the phrase " Eu gosto de comer arroz" is:

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline
  
tokenizer = AutoTokenizer.from_pretrained("unicamp-dl/translation-pt-en-t5")

model = AutoModelForSeq2SeqLM.from_pretrained("unicamp-dl/translation-pt-en-t5")

pten_pipeline = pipeline('text2text-generation', model=model, tokenizer=tokenizer)

pten_pipeline("translate Portuguese to English: Eu gosto de comer arroz.")

Citation

@inproceedings{lopes-etal-2020-lite,
    title = "Lite Training Strategies for {P}ortuguese-{E}nglish and {E}nglish-{P}ortuguese Translation",
    author = "Lopes, Alexandre  and
      Nogueira, Rodrigo  and
      Lotufo, Roberto  and
      Pedrini, Helio",
    booktitle = "Proceedings of the Fifth Conference on Machine Translation",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.wmt-1.90",
    pages = "833--840",
}
Downloads last month
1,176
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Spaces using unicamp-dl/translation-pt-en-t5 6