File size: 2,121 Bytes
d1f5d67 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 |
---
license: mit
language:
- ar
pipeline_tag: text-generation
tags:
- 'arabic '
- text-generation
---
Model Description
Model Name: ArabicGPT-S
Architecture: GPT-2
Layers: 12
Model Size: 134M
Context Window Size: 768
ArabianGPT is a custom-trained version of the GPT-2 base model, specifically tailored for the Arabic language. It is designed to understand and generate Arabic text, making it suitable for various natural language processing tasks in Arabic.
Training
Dataset: Abu Elkhiar Corpus
Size: 15.5 GB
Number of Words: 237,814,541
Number of Tokens: 1,752,421,071
Epochs: 5.87
Loss: 3.97
The model was trained on the Abu Elkhiar dataset, a comprehensive Arabic text corpus encompassing a wide range of topics. The training process focused on adapting the model to understand the nuances and complexities of the Arabic language.
Tokenizer:
Type: Custom trained SentencePiece tokenizer
Vocabulary Size: 64K
We employed AraNizer, a custom trained tokenizer based on the SentencePiece model, with a vocabulary size of 64. This choice was made to optimize the model's performance for the specific characteristics of the Arabic language.
Usage
ArabianGPT can be used for text generation
Limitations
As with any language model, ArabicGPT may have limitations in understanding context or generating text in certain scenarios. Users should be aware of these limitations and use the model accordingly.
Ethical Considerations
We emphasize responsible usage of ArabianGPT. Users should ensure that the generated text is used ethically and does not propagate misinformation or harmful content.
Citation
If you use ArabianGPT in your research or application, please cite it as follows:
@misc{ArabianGPT, 2023,
title={ArabianGPT: A GPT-2 Based Language Model for Arabic},
author={Najar, Omar and Sibaee, Serry and Ghouti, Lahouari and Koubaa, Anis},
affiliation={Prince Sultan University, Riyadh, Saudi Arabia},
year={2023},
}
Acknowledgments
We thank Prince Sultan University espically Robotoics and Internet of Things Lab for suuport
Contact
For inquiries regarding ArabicGPT-S, please contact onajar@psu.edu.sa |