babylm-default_seed-42_1e-3
This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 3.0125
- Accuracy: 0.4208
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.001
- train_batch_size: 32
- eval_batch_size: 64
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 256
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 32000
- num_epochs: 20.0
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss | Accuracy |
---|---|---|---|---|
6.173 | 0.9997 | 1788 | 4.2526 | 0.3061 |
4.0459 | 1.9999 | 3577 | 3.7287 | 0.3474 |
3.6173 | 2.9995 | 5365 | 3.4746 | 0.3704 |
3.395 | 3.9997 | 7154 | 3.3376 | 0.3836 |
3.3065 | 4.9999 | 8943 | 3.2582 | 0.3911 |
3.202 | 5.9996 | 10731 | 3.2096 | 0.3957 |
3.1358 | 6.9998 | 12520 | 3.1768 | 0.3991 |
3.0931 | 8.0 | 14309 | 3.1536 | 0.4014 |
3.0605 | 8.9997 | 16097 | 3.1361 | 0.4035 |
3.0176 | 9.9999 | 17886 | 3.1262 | 0.4047 |
2.9953 | 10.9995 | 19674 | 3.1186 | 0.4056 |
2.987 | 11.9997 | 21463 | 3.1099 | 0.4066 |
2.9794 | 12.9999 | 23252 | 3.1034 | 0.4076 |
2.9745 | 13.9996 | 25040 | 3.0990 | 0.4079 |
2.9327 | 14.9998 | 26829 | 3.0990 | 0.4078 |
2.9374 | 16.0 | 28618 | 3.0970 | 0.4082 |
2.9411 | 16.9997 | 30406 | 3.0878 | 0.4091 |
2.9448 | 17.9999 | 32195 | 3.0865 | 0.4099 |
2.89 | 18.9995 | 33983 | 3.0356 | 0.4161 |
2.7351 | 19.9930 | 35760 | 3.0125 | 0.4208 |
Framework versions
- Transformers 4.45.1
- Pytorch 2.5.1+cu124
- Datasets 2.19.1
- Tokenizers 0.20.0
- Downloads last month
- 143
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.