license: mit
language:
- wo
- fr
metrics:
- bleu
pipeline_tag: translation
tags:
- text-generation-inference
Model Documentation: Wolof to French Translation with NLLB-200
Model Overview
This document describes a machine translation model fine-tuned from Meta's NLLB-200 for translating from Wolof to French. The model, hosted at cifope/nllb-200-wo-fr-distilled-600M
, utilizes a distilled version of the NLLB-200 model which has been specifically optimized for translation tasks between the Wolof and French languages.
Dependencies
The model requires the transformers
library by Hugging Face. Ensure that you have the library installed:
pip install transformers
Setup
Import necessary classes from the transformers
library:
from transformers import AutoModelForSeq2SeqLM, NllbTokenizer
Initialize the model and tokenizer:
model = AutoModelForSeq2SeqLM.from_pretrained('cifope/nllb-200-wo-fr-distilled-600M')
tokenizer = NllbTokenizer.from_pretrained('facebook/nllb-200-distilled-600M')
Translation Functions
Translate from French to Wolof
The translate
function translates text from French to Wolof:
def translate(text, src_lang='fra_Latn', tgt_lang='wol_Latn', a=16, b=1.5, max_input_length=1024, **kwargs):
tokenizer.src_lang = src_lang
tokenizer.tgt_lang = tgt_lang
inputs = tokenizer(text, return_tensors='pt', padding=True, truncation=True, max_length=max_input_length)
result = model.generate(
**inputs.to(model.device),
forced_bos_token_id=tokenizer.convert_tokens_to_ids(tgt_lang),
max_new_tokens=int(a + b * inputs.input_ids.shape[1]),
**kwargs
)
return tokenizer.batch_decode(result, skip_special_tokens=True)
Translate from Wolof to French
The reversed_translate
function translates text from Wolof to French:
def reversed_translate(text, src_lang='wol_Latn', tgt_lang='fra_Latn', a=16, b=1.5, max_input_length=1024, **kwargs):
tokenizer.src_lang = src_lang
tokenizer.tgt_lang = tgt_lang
inputs = tokenizer(text, return_tensors='pt', padding=True, truncation=True, max_length=max_input_length)
result = model.generate(
**inputs.to(model.device),
forced_bos_token_id=tokenizer.convert_tokens_to_ids(tgt_lang),
max_new_tokens=int(a + b * inputs.input_ids.shape[1]),
**kwargs
)
return tokenizer.batch_decode(result, skip_special_tokens=True)
Usage
To use the model for translating text, simply call the translate
or reversed_translate
function with the appropriate text and parameters. For example:
french_text = "L'argent peut être échangé à la seule banque des îles située à Stanley"
wolof_translation = translate(french_text)
print(wolof_translation)
wolof_text = "alkaati yi tàmbali nañu xàll léegi kilifa gi ñów"
french_translation = reversed_translate(wolof_text)
print(french_translation)
wolof_text = "alkaati yi tàmbali nañu xàll léegi kilifa gi ñów"
english_translation = reversed_translate(wolof_text,tgt_lang="eng_Latn")
print(english_translation)