shahidul034
/

BanglaGPT_512

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

BanglaGPT_512 / README.md

shahidul034's picture

Update README.md

ae0391e verified 5 months ago

|

history blame contribute delete

1.37 kB


	# text_generation_bangla_model
	BanglaCLM dataset:

	- OSCAR: 12.84GB

	- Wikipedia dump: 6.24GB

	- ProthomAlo: 3.92GB

	- Kalerkantho: 3.24GB


	## Model description

	- context size : 128


	## Training and evaluation data
	The BanglaCLM data set is divided into a training set (90%)and a validation set (10%).


	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:

	- Batch size: 32

	- Initial learning rate: 5e-5

	- Number of warmup steps: 10000

	- Weight decay rate: 0.01

	- Tokenization algorithm: BPE

	- Vocabulary size of tokenizer: 50256

	- Total trainable params: 124,439,808

	- Epochs: 40

	- Number of training steps: 40772228

	- training_precision: float32


	### Training results

	perplexity score: 2.86.


	### Framework versions

	- Transformers 4.26.1
	- TensorFlow 2.11.0
	- Datasets 2.10.0
	- Tokenizers 0.13.2

	### Citation
	If you find this model helpful, please cite.
	```
	@INPROCEEDINGS{10303383,
	author={Salim, Md. Shahidul and Murad, Hasan and Das, Dola and Ahmed, Faisal},
	booktitle={2023 International Conference on Information and Communication Technology for Sustainable Development (ICICT4SD)},
	title={BanglaGPT: A Generative Pretrained Transformer-Based Model for Bangla Language},
	year={2023},
	volume={},
	number={},
	pages={56-59},
	doi={10.1109/ICICT4SD59951.2023.10303383}}

	```