metadata
library_name: transformers
tags:
- generated_from_trainer
metrics:
- accuracy
model-index:
- name: babylm-unablated_seed-42_1e-3
results: []
babylm-unablated_seed-42_1e-3
This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 3.0130
- Accuracy: 0.4207
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.001
- train_batch_size: 32
- eval_batch_size: 64
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 256
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 32000
- num_epochs: 20.0
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss | Accuracy |
---|---|---|---|---|
6.173 | 0.9997 | 1802 | 4.2465 | 0.3068 |
4.0432 | 2.0 | 3605 | 3.7284 | 0.3475 |
3.6166 | 2.9997 | 5407 | 3.4711 | 0.3707 |
3.3909 | 4.0 | 7210 | 3.3322 | 0.3836 |
3.2635 | 4.9997 | 9012 | 3.2560 | 0.3913 |
3.1946 | 6.0 | 10815 | 3.2042 | 0.3959 |
3.1312 | 6.9997 | 12617 | 3.1730 | 0.3994 |
3.0872 | 8.0 | 14420 | 3.1528 | 0.4014 |
3.0554 | 8.9997 | 16222 | 3.1347 | 0.4037 |
3.0338 | 10.0 | 18025 | 3.1247 | 0.4043 |
2.9873 | 10.9997 | 19827 | 3.1119 | 0.4059 |
2.9773 | 12.0 | 21630 | 3.1093 | 0.4065 |
2.9714 | 12.9997 | 23432 | 3.1020 | 0.4077 |
2.9667 | 14.0 | 25235 | 3.0977 | 0.4077 |
2.9621 | 14.9997 | 27037 | 3.0944 | 0.4085 |
2.9239 | 16.0 | 28840 | 3.0954 | 0.4082 |
2.9295 | 16.9997 | 30642 | 3.0936 | 0.4085 |
2.9329 | 18.0 | 32445 | 3.0771 | 0.4106 |
2.8605 | 18.9997 | 34247 | 3.0338 | 0.4164 |
2.7104 | 19.9945 | 36040 | 3.0130 | 0.4207 |
Framework versions
- Transformers 4.45.1
- Pytorch 2.4.1+cu121
- Datasets 2.19.1
- Tokenizers 0.20.0