collapse_gemma-2-2b_hs2_replace_iter3_sftsd2
This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.8917
- Num Input Tokens Seen: 4953776
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 2
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.3909 | 0 |
1.662 | 0.0527 | 5 | 1.2677 | 263936 |
1.043 | 0.1053 | 10 | 1.2121 | 527776 |
0.8186 | 0.1580 | 15 | 1.3663 | 793808 |
0.5588 | 0.2107 | 20 | 1.4675 | 1057184 |
0.3341 | 0.2633 | 25 | 1.6131 | 1320008 |
0.2249 | 0.3160 | 30 | 1.7661 | 1582280 |
0.1707 | 0.3687 | 35 | 1.8590 | 1848456 |
0.0813 | 0.4213 | 40 | 1.9520 | 2110720 |
0.0719 | 0.4740 | 45 | 1.8883 | 2375976 |
0.0652 | 0.5267 | 50 | 1.9238 | 2633904 |
0.0556 | 0.5793 | 55 | 1.9031 | 2897008 |
0.0638 | 0.6320 | 60 | 1.8555 | 3161296 |
0.0524 | 0.6847 | 65 | 1.8461 | 3434104 |
0.0338 | 0.7373 | 70 | 1.8539 | 3694144 |
0.0549 | 0.7900 | 75 | 1.8739 | 3946736 |
0.0352 | 0.8427 | 80 | 1.8748 | 4208840 |
0.0425 | 0.8953 | 85 | 1.8757 | 4471136 |
0.0349 | 0.9480 | 90 | 1.8530 | 4741888 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 1
Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_replace_iter3_sftsd2
Base model
google/gemma-2-2b