metadata

library_name: transformers
tags:
  - generated_from_trainer
metrics:
  - accuracy
model-index:
  - name: babylm-unablated_seed-42_1e-3
    results: []

babylm-unablated_seed-42_1e-3

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 3.0130
Accuracy: 0.4207

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 32
eval_batch_size: 64
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 256
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 32000
num_epochs: 20.0
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
6.173	0.9997	1802	4.2465	0.3068
4.0432	2.0	3605	3.7284	0.3475
3.6166	2.9997	5407	3.4711	0.3707
3.3909	4.0	7210	3.3322	0.3836
3.2635	4.9997	9012	3.2560	0.3913
3.1946	6.0	10815	3.2042	0.3959
3.1312	6.9997	12617	3.1730	0.3994
3.0872	8.0	14420	3.1528	0.4014
3.0554	8.9997	16222	3.1347	0.4037
3.0338	10.0	18025	3.1247	0.4043
2.9873	10.9997	19827	3.1119	0.4059
2.9773	12.0	21630	3.1093	0.4065
2.9714	12.9997	23432	3.1020	0.4077
2.9667	14.0	25235	3.0977	0.4077
2.9621	14.9997	27037	3.0944	0.4085
2.9239	16.0	28840	3.0954	0.4082
2.9295	16.9997	30642	3.0936	0.4085
2.9329	18.0	32445	3.0771	0.4106
2.8605	18.9997	34247	3.0338	0.4164
2.7104	19.9945	36040	3.0130	0.4207

Framework versions

Transformers 4.45.1
Pytorch 2.4.1+cu121
Datasets 2.19.1
Tokenizers 0.20.0