Ebtihal
/

AraBertMo_base_V5

Inference Endpoints

Model card Files Files and versions Community

AraBertMo_base_V5 / README.md

Ebtihal's picture

Update README.md

b2a00bc over 2 years ago

|

1.78 kB

	---
	language: ar
	tags: Fill-Mask
	datasets: OSCAR
	widget:
	- text: " السلام عليكم ورحمة[MASK] وبركاتة"
	- text: " اهلا وسهلا بكم في [MASK] من سيربح المليون"
	- text: " مرحبا بك عزيزي الزائر [MASK] موقعنا "

	---

	# Arabic BERT Model
	AraBERTMo is an Arabic pre-trained language model based on [Google's BERT architechture](https://github.com/google-research/bert).
	AraBERTMo_base uses the same BERT-Base config.
	AraBERTMo_base now comes in 10 new variants
	All models are available on the `HuggingFace` model page under the [Ebtihal](https://huggingface.co/Ebtihal/) name.
	Checkpoints are available in PyTorch formats.

	## Pretraining Corpus
	`AraBertMo_base_V5' model was pre-trained on ~3 million words:
	- [OSCAR](https://traces1.inria.fr/oscar/) - Arabic version "unshuffled_deduplicated_ar".

	## Training results
	this model achieves the following results:

	\| Task \| Num examples \| Num Epochs \| Batch Size \| steps \| Wall time \| training loss\|
	\|:----:\|:----:\|:----:\|:----:\|:-----:\|:----:\|:-----:\|
	\| Fill-Mask\| 50046\| 5 \| 64 \| 3910 \| 6h 49m 59s \| 7.4599 \|

	## Load Pretrained Model
	You can use this model by installing `torch` or `tensorflow` and Huggingface library `transformers`. And you can use it directly by initializing it like this:
	```python
	from transformers import AutoTokenizer, AutoModel
	tokenizer = AutoTokenizer.from_pretrained("Ebtihal/AraBertMo_base_V5")
	model = AutoModelForMaskedLM.from_pretrained("Ebtihal/AraBertMo_base_V5")
	```

	## This model was built for master's degree research in an organization:
	- [University of kufa](https://uokufa.edu.iq/).
	- [Faculty of Computer Science and Mathematics](https://mathcomp.uokufa.edu.iq/).
	- Department of Computer Science