|
--- |
|
license: mit |
|
language: |
|
- ar |
|
pipeline_tag: text-generation |
|
tags: |
|
- 'arabic ' |
|
- text-generation |
|
--- |
|
Model Description |
|
Model Name: ArabicGPT-S |
|
Architecture: GPT-2 |
|
Layers: 12 |
|
Model Size: 134M |
|
Context Window Size: 768 |
|
|
|
ArabianGPT is a custom-trained version of the GPT-2 base model, specifically tailored for the Arabic language. It is designed to understand and generate Arabic text, making it suitable for various natural language processing tasks in Arabic. |
|
|
|
Training |
|
Dataset: Abu Elkhiar Corpus |
|
Size: 15.5 GB |
|
Number of Words: 237,814,541 |
|
Number of Tokens: 1,752,421,071 |
|
Epochs: 5.87 |
|
Loss: 3.97 |
|
|
|
The model was trained on the Abu Elkhiar dataset, a comprehensive Arabic text corpus encompassing a wide range of topics. The training process focused on adapting the model to understand the nuances and complexities of the Arabic language. |
|
|
|
Tokenizer: |
|
Type: Custom trained SentencePiece tokenizer |
|
Vocabulary Size: 64K |
|
|
|
We employed AraNizer, a custom trained tokenizer based on the SentencePiece model, with a vocabulary size of 64. This choice was made to optimize the model's performance for the specific characteristics of the Arabic language. |
|
|
|
Usage |
|
ArabianGPT can be used for text generation |
|
|
|
Limitations |
|
As with any language model, ArabicGPT may have limitations in understanding context or generating text in certain scenarios. Users should be aware of these limitations and use the model accordingly. |
|
|
|
Ethical Considerations |
|
We emphasize responsible usage of ArabianGPT. Users should ensure that the generated text is used ethically and does not propagate misinformation or harmful content. |
|
|
|
Citation |
|
If you use ArabianGPT in your research or application, please cite it as follows: |
|
|
|
@misc{ArabianGPT, 2023, |
|
title={ArabianGPT: A GPT-2 Based Language Model for Arabic}, |
|
author={Najar, Omar and Sibaee, Serry and Ghouti, Lahouari and Koubaa, Anis}, |
|
affiliation={Prince Sultan University, Riyadh, Saudi Arabia}, |
|
year={2023}, |
|
} |
|
|
|
|
|
Acknowledgments |
|
We thank Prince Sultan University espically Robotoics and Internet of Things Lab for suuport |
|
|
|
Contact |
|
For inquiries regarding ArabicGPT-S, please contact onajar@psu.edu.sa |