Update README.md
Browse files
README.md
CHANGED
@@ -1,50 +1,52 @@
|
|
1 |
---
|
2 |
-
|
3 |
-
-
|
4 |
-
|
5 |
-
-
|
6 |
-
|
|
|
|
|
7 |
---
|
8 |
|
9 |
-
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
10 |
-
should probably proofread and complete it, then remove this comment. -->
|
11 |
|
12 |
-
|
13 |
|
14 |
-
|
15 |
|
16 |
-
## Model description
|
17 |
|
18 |
-
More information needed
|
19 |
|
20 |
-
##
|
21 |
|
22 |
-
|
23 |
|
24 |
-
|
25 |
|
26 |
-
More information needed
|
27 |
|
28 |
-
|
29 |
|
30 |
-
### Training hyperparameters
|
31 |
|
32 |
-
|
33 |
-
- learning_rate: 2e-05
|
34 |
-
- train_batch_size: 8
|
35 |
-
- eval_batch_size: 8
|
36 |
-
- seed: 42
|
37 |
-
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
38 |
-
- lr_scheduler_type: linear
|
39 |
-
- num_epochs: 5
|
40 |
|
41 |
-
|
42 |
|
43 |
|
44 |
|
45 |
-
|
46 |
|
47 |
-
|
48 |
-
|
49 |
-
-
|
50 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
datasets:
|
3 |
+
- AfnanTS/Final_ArLAMA_DS_tokenized_for_ARBERTv2
|
4 |
+
language:
|
5 |
+
- ar
|
6 |
+
base_model:
|
7 |
+
- UBC-NLP/ARBERTv2
|
8 |
+
pipeline_tag: fill-mask
|
9 |
---
|
10 |
|
|
|
|
|
11 |
|
12 |
+
<img src="./Arab_BERT2.jpeg" alt="Model Logo" width="30%" height="30%" align="right"/>
|
13 |
|
14 |
+
**ARBERTv2_ArLAMA** is a transformer-based Arabic language model fine-tuned on Masked Language Modeling (MLM) tasks. The model uses Knowledge Graphs (KGs) to enhance its understanding of semantic relations and improve its performance in various Arabic NLP tasks.
|
15 |
|
|
|
16 |
|
|
|
17 |
|
18 |
+
## Uses
|
19 |
|
20 |
+
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
|
21 |
|
22 |
+
### Direct Use
|
23 |
|
|
|
24 |
|
25 |
+
Filling masked tokens in Arabic text, particularly in contexts enriched with knowledge from KGs.
|
26 |
|
|
|
27 |
|
28 |
+
### Downstream Use
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
29 |
|
30 |
+
Can be further fine-tuned for Arabic NLP tasks that require semantic understanding, such as text classification or question answering.
|
31 |
|
32 |
|
33 |
|
34 |
+
## How to Get Started with the Model
|
35 |
|
36 |
+
```python
|
37 |
+
from transformers import pipeline
|
38 |
+
fill_mask = pipeline("fill-mask", model="AfnanTS/ARBERTv2_ArLAMA")
|
39 |
+
fill_mask("اللغة [MASK] مهمة جدا."
|
40 |
+
```
|
41 |
+
|
42 |
+
## Training Details
|
43 |
+
|
44 |
+
### Training Data
|
45 |
+
|
46 |
+
Trained on the ArLAMA dataset, which is designed to represent Knowledge Graphs in natural language.
|
47 |
+
|
48 |
+
|
49 |
+
|
50 |
+
### Training Procedure
|
51 |
+
|
52 |
+
Continued pre-training of ArBERTv2 using Masked Language Modeling (MLM) tasks, integrating structured knowledge from Knowledge Graphs.
|