|
|
|
# text_generation_bangla_model |
|
BanglaCLM dataset: |
|
|
|
- OSCAR: 12.84GB |
|
|
|
- Wikipedia dump: 6.24GB |
|
|
|
- ProthomAlo: 3.92GB |
|
|
|
- Kalerkantho: 3.24GB |
|
|
|
|
|
## Model description |
|
|
|
- context size : 128 |
|
|
|
|
|
## Training and evaluation data |
|
The BanglaCLM data set is divided into a training set (90%)and a validation set (10%). |
|
|
|
|
|
## Training procedure |
|
|
|
### Training hyperparameters |
|
|
|
The following hyperparameters were used during training: |
|
|
|
- Batch size: 32 |
|
|
|
- Initial learning rate: 5e-5 |
|
|
|
- Number of warmup steps: 10000 |
|
|
|
- Weight decay rate: 0.01 |
|
|
|
- Tokenization algorithm: BPE |
|
|
|
- Vocabulary size of tokenizer: 50256 |
|
|
|
- Total trainable params: 124,439,808 |
|
|
|
- Epochs: 40 |
|
|
|
- Number of training steps: 40772228 |
|
|
|
- training_precision: float32 |
|
|
|
|
|
### Training results |
|
|
|
perplexity score: 2.86. |
|
|
|
|
|
### Framework versions |
|
|
|
- Transformers 4.26.1 |
|
- TensorFlow 2.11.0 |
|
- Datasets 2.10.0 |
|
- Tokenizers 0.13.2 |
|
|
|
### Citation |
|
If you find this model helpful, please cite. |
|
``` |
|
@INPROCEEDINGS{10303383, |
|
author={Salim, Md. Shahidul and Murad, Hasan and Das, Dola and Ahmed, Faisal}, |
|
booktitle={2023 International Conference on Information and Communication Technology for Sustainable Development (ICICT4SD)}, |
|
title={BanglaGPT: A Generative Pretrained Transformer-Based Model for Bangla Language}, |
|
year={2023}, |
|
volume={}, |
|
number={}, |
|
pages={56-59}, |
|
doi={10.1109/ICICT4SD59951.2023.10303383}} |
|
|
|
``` |
|
|