|
--- |
|
datasets: |
|
- AfnanTS/Final_ArLAMA_DS_tokenized_for_ARBERTv2 |
|
language: |
|
- ar |
|
base_model: |
|
- UBC-NLP/ARBERTv2 |
|
pipeline_tag: fill-mask |
|
--- |
|
|
|
|
|
<img src="./arab_icon2.png" alt="Model Logo" width="30%" height="30%" align="right"/> |
|
|
|
**ARBERTv2_ArLAMA** is a transformer-based Arabic language model fine-tuned on Masked Language Modeling (MLM) tasks. The model uses Knowledge Graphs (KGs) to enhance its understanding of semantic relations and improve its performance in various Arabic NLP tasks. |
|
|
|
|
|
|
|
## Uses |
|
|
|
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. --> |
|
|
|
### Direct Use |
|
|
|
|
|
Filling masked tokens in Arabic text, particularly in contexts enriched with knowledge from KGs. |
|
|
|
|
|
### Downstream Use |
|
|
|
Can be further fine-tuned for Arabic NLP tasks that require semantic understanding, such as text classification or question answering. |
|
|
|
|
|
|
|
## How to Get Started with the Model |
|
|
|
```python |
|
from transformers import pipeline |
|
fill_mask = pipeline("fill-mask", model="AfnanTS/ARBERTv2_ArLAMA") |
|
fill_mask("اللغة [MASK] مهمة جدا." |
|
``` |
|
|
|
## Training Details |
|
|
|
### Training Data |
|
|
|
Trained on the ArLAMA dataset, which is designed to represent Knowledge Graphs in natural language. |
|
|
|
|
|
|
|
### Training Procedure |
|
|
|
Continued pre-training of ArBERTv2 using Masked Language Modeling (MLM) tasks, integrating structured knowledge from Knowledge Graphs. |