File size: 1,371 Bytes

7cdce4e
ae0391e
 
7cdce4e
ae0391e
7cdce4e
ae0391e
7cdce4e
ae0391e
7cdce4e
ae0391e
7cdce4e
 
ae0391e
7cdce4e
ae0391e
7cdce4e
 
ae0391e
 
7cdce4e
 
ae0391e
7cdce4e
ae0391e
7cdce4e
ae0391e
7cdce4e
ae0391e
7cdce4e
ae0391e
7cdce4e
ae0391e
7cdce4e
ae0391e
7cdce4e
ae0391e
7cdce4e
ae0391e
7cdce4e
ae0391e
7cdce4e
ae0391e
7cdce4e
ae0391e
7cdce4e
ae0391e
7cdce4e
 
ae0391e
7cdce4e
ae0391e
7cdce4e
 
ae0391e
7cdce4e
ae0391e
 
 
 
7cdce4e
ae0391e
 
 
 
 
 
 
 
 
 
 
 
7cdce4e
ae0391e


# text_generation_bangla_model
BanglaCLM dataset: 

- OSCAR: 12.84GB

- Wikipedia dump: 6.24GB

- ProthomAlo: 3.92GB

- Kalerkantho: 3.24GB


## Model description

- context size : 128


## Training and evaluation data
The BanglaCLM data set is divided into a training set (90%)and a validation set (10%).


## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:

- Batch size: 32

- Initial learning rate: 5e-5

- Number of warmup steps: 10000

- Weight decay rate: 0.01

- Tokenization algorithm: BPE

- Vocabulary size of tokenizer: 50256

- Total trainable params: 124,439,808

- Epochs: 40

- Number of training steps: 40772228

- training_precision: float32


### Training results

perplexity score: 2.86.


### Framework versions

- Transformers 4.26.1
- TensorFlow 2.11.0
- Datasets 2.10.0
- Tokenizers 0.13.2

### Citation
If you find this model helpful, please cite.
```
@INPROCEEDINGS{10303383,
  author={Salim, Md. Shahidul and Murad, Hasan and Das, Dola and Ahmed, Faisal},
  booktitle={2023 International Conference on Information and Communication Technology for Sustainable Development (ICICT4SD)}, 
  title={BanglaGPT: A Generative Pretrained Transformer-Based Model for Bangla Language}, 
  year={2023},
  volume={},
  number={},
  pages={56-59},
  doi={10.1109/ICICT4SD59951.2023.10303383}}

```