README.md · AfnanTS/ARBERT_ArLAMA at a23fc1a9fb44fd22e8a699c29767d4647ebfac23

metadata

datasets:
  - AfnanTS/Final_ArLAMA_DS_tokenized_for_ARBERTv2
language:
  - ar
base_model:
  - UBC-NLP/ARBERTv2
pipeline_tag: fill-mask

ArBERTV1_MLM is a pre-trained Arabic language model fine-tuned using Masked Language Modeling (MLM) tasks. This model leverages Knowledge Graphs (KGs) to capture semantic relations in Arabic text, aiming to improve vocabulary comprehension and performance in downstream tasks.

Uses

Direct Use

Filling masked tokens in Arabic text, particularly in contexts enriched with knowledge from KGs.

Downstream Use

Can be further fine-tuned for Arabic NLP tasks that require semantic understanding, such as text classification or question answering.

How to Get Started with the Model

from transformers import pipeline
fill_mask = pipeline("fill-mask", model="AfnanTS/ARBERT_ArLAMA")
fill_mask("اللغة [MASK] مهمة جدا."

Training Details

Training Data

Trained on the ArLAMA dataset, which is designed to represent Knowledge Graphs in natural language.

Training Procedure

Continued pre-training of ArBERTv1 using Masked Language Modeling (MLM) to integrate KG-based knowledge.