kanishka
/

opt-babylm2-subset-default-20-epochs-1e-3

Text Generation

Generated from Trainer

text-generation-inference

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

opt-babylm2-subset-default-20-epochs-1e-3

This model was trained from scratch on the kanishka/babylm2-subset dataset. It achieves the following results on the evaluation set:

Loss: 2.4350
Accuracy: 0.5324

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 32
eval_batch_size: 64
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 32000
num_epochs: 20.0
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
2.5365	1.0	14169	2.7500	0.4857
2.3708	2.0	28338	2.5870	0.5032
2.2572	3.0	42507	2.4839	0.5150
2.1958	4.0	56676	2.4295	0.5220
2.1251	5.0	70845	2.4013	0.5259
2.0769	6.0	85014	2.3830	0.5281
2.043	7.0	99183	2.3736	0.5304
2.007	8.0	113352	2.3671	0.5313
1.9813	9.0	127521	2.3661	0.5322
1.9593	10.0	141690	2.3705	0.5325
1.933	11.0	155859	2.3677	0.5331
1.9106	12.0	170028	2.3727	0.5333
1.8847	13.0	184197	2.3779	0.5335
1.8636	14.0	198366	2.3834	0.5335
1.8391	15.0	212535	2.3955	0.5334
1.8179	16.0	226704	2.4015	0.5332
1.7918	17.0	240873	2.4100	0.5331
1.7674	18.0	255042	2.4159	0.5330
1.751	19.0	269211	2.4263	0.5327
1.7338	20.0	283380	2.4350	0.5324

Framework versions

Transformers 4.42.4
Pytorch 2.2.0+cu121
Datasets 2.16.1
Tokenizers 0.19.1

Downloads last month: 21

Safetensors

Model size

97.8M params

Tensor type

F32

·

Inference Providers NEW

Text Generation

This model is not currently available via any of the supported Inference Providers.

Dataset used to train kanishka/opt-babylm2-subset-default-20-epochs-1e-3

Evaluation results

Accuracy on kanishka/babylm2-subset
self-reported

0.532

View on Papers With Code