Instructions to use tachicart/nllb-ft-darija with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use tachicart/nllb-ft-darija with Transformers:
# Use a pipeline as a high-level helper # Warning: Pipeline type "translation" is no longer supported in transformers v5. # You must load the model directly (see below) or downgrade to v4.x with: # 'pip install "transformers<5.0.0' from transformers import pipeline pipe = pipeline("translation", model="tachicart/nllb-ft-darija")# Load model directly from transformers import AutoTokenizer, AutoModelForSeq2SeqLM tokenizer = AutoTokenizer.from_pretrained("tachicart/nllb-ft-darija") model = AutoModelForSeq2SeqLM.from_pretrained("tachicart/nllb-ft-darija") - Notebooks
- Google Colab
- Kaggle
NLLB Fine-tuned for Darija to Modern Standard Arabic Translation
This model is a fine-tuned version of facebook/nllb-200-distilled-600M for translating Moroccan Darija (ary) to Modern Standard Arabic (ar). The model was fine-tuned on a custom dataset using the Hugging Face transformers library.
The model is developed by : Tachicart Ridouane, Bouzoubaa Karim
tachicart@gmail.com
Model Details
- Base Model:
facebook/nllb-200-distilled-600M - Fine-tuning Library: Hugging Face
transformers - Languages Supported: Moroccan Darija (ary), Modern Standard Arabic (ar)
- Training Dataset: Custom dataset of Moroccan Darija and Modern Standard Arabic pairs in JSON format.
Performance
The model has been evaluated on a validation set to ensure translation quality. While it excels at capturing colloquial Moroccan Arabic, ongoing improvements and additional data can further enhance its performance.
Limitations
Dataset Size: The custom dataset consists of 21,000 samples, which may limit coverage of diverse expressions and rare terms. Colloquial Variations: Moroccan Arabic has many dialectal variations, which might not all be covered equally.
How to Use
You can use the model with the transformers library as follows:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("tachicart/nllb-ft-darija")
model = AutoModelForSeq2SeqLM.from_pretrained("tachicart/nllb-ft-darija")
# Example translation
inputs = tokenizer("ูููุงุด ููุฏุฑ ูุฑุจุญ ุจุฒุงู ุฏูุงู ุงููููุณ ุจุงูุฒุฑุจุฉ ", return_tensors="pt")
outputs = model.generate(**inputs)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
- Downloads last month
- 35