You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

NLLB Fine-tuned for Darija to Modern Standard Arabic Translation

This model is a fine-tuned version of facebook/nllb-200-distilled-600M for translating Moroccan Darija (ary) to Modern Standard Arabic (ar). The model was fine-tuned on a custom dataset using the Hugging Face transformers library. The model is developed by : Tachicart Ridouane, Bouzoubaa Karim tachicart@gmail.com

Model Details

Base Model: facebook/nllb-200-distilled-600M
Fine-tuning Library: Hugging Face transformers
Languages Supported: Moroccan Darija (ary), Modern Standard Arabic (ar)
Training Dataset: Custom dataset of Moroccan Darija and Modern Standard Arabic pairs in JSON format.

Performance

The model has been evaluated on a validation set to ensure translation quality. While it excels at capturing colloquial Moroccan Arabic, ongoing improvements and additional data can further enhance its performance.

Limitations

Dataset Size: The custom dataset consists of 21,000 samples, which may limit coverage of diverse expressions and rare terms. Colloquial Variations: Moroccan Arabic has many dialectal variations, which might not all be covered equally.

How to Use

You can use the model with the transformers library as follows:

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("tachicart/nllb-ft-darija")
model = AutoModelForSeq2SeqLM.from_pretrained("tachicart/nllb-ft-darija")

# Example translation
inputs = tokenizer("كيفاش نقدر نربح بزاف ديال الفلوس بالزربة  ", return_tensors="pt")
outputs = model.generate(**inputs)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Downloads last month: 35

Safetensors

Model size

0.6B params

Tensor type

F32