Edit model card
YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

text_generation_bangla_model

BanglaCLM dataset:

  • OSCAR: 12.84GB

  • Wikipedia dump: 6.24GB

  • ProthomAlo: 3.92GB

  • Kalerkantho: 3.24GB

Model description

  • context size : 128

Training and evaluation data

The BanglaCLM data set is divided into a training set (90%)and a validation set (10%).

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • Batch size: 32

  • Initial learning rate: 5e-5

  • Number of warmup steps: 10000

  • Weight decay rate: 0.01

  • Tokenization algorithm: BPE

  • Vocabulary size of tokenizer: 50256

  • Total trainable params: 124,439,808

  • Epochs: 40

  • Number of training steps: 40772228

  • training_precision: float32

Training results

perplexity score: 2.86.

Framework versions

  • Transformers 4.26.1
  • TensorFlow 2.11.0
  • Datasets 2.10.0
  • Tokenizers 0.13.2

Citation

If you find this model helpful, please cite.

@INPROCEEDINGS{10303383,
  author={Salim, Md. Shahidul and Murad, Hasan and Das, Dola and Ahmed, Faisal},
  booktitle={2023 International Conference on Information and Communication Technology for Sustainable Development (ICICT4SD)}, 
  title={BanglaGPT: A Generative Pretrained Transformer-Based Model for Bangla Language}, 
  year={2023},
  volume={},
  number={},
  pages={56-59},
  doi={10.1109/ICICT4SD59951.2023.10303383}}
Downloads last month
13
Safetensors
Model size
124M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.