Small-Data Pre-training
Collection
5 items
•
Updated
language:
AfriTeVa base is a multilingual sequence to sequence model pretrained on 10 African languages
Afaan Oromoo(orm), Amharic(amh), Gahuza(gah), Hausa(hau), Igbo(igb), Nigerian Pidgin(pcm), Somali(som), Swahili(swa), Tigrinya(tig), Yoruba(yor)
afriteva_base
is pre-trained model and primarily aimed at being fine-tuned on multilingual sequence-to-sequence tasks.
>>> from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
>>> tokenizer = AutoTokenizer.from_pretrained("castorini/afriteva_base")
>>> model = AutoModelForSeq2SeqLM.from_pretrained("castorini/afriteva_base")
>>> src_text = "Ó hùn ọ́ láti di ara wa bí?"
>>> tgt_text = "Would you like to be?"
>>> model_inputs = tokenizer(src_text, return_tensors="pt")
>>> with tokenizer.as_target_tokenizer():
labels = tokenizer(tgt_text, return_tensors="pt").input_ids
>>> model(**model_inputs, labels=labels) # forward pass
For information on training procedures, please refer to the AfriTeVa paper or repository
coming soon ...