collapse_gemma-2-9b_hs2_accumulate_iter3_sftsd1
This model is a fine-tuned version of google/gemma-2-9b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.9448
- Num Input Tokens Seen: 14395896
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 4
- eval_batch_size: 16
- seed: 1
- gradient_accumulation_steps: 32
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.2335 | 0 |
1.2638 | 0.0178 | 5 | 1.1350 | 263564 |
1.088 | 0.0356 | 10 | 1.0496 | 522096 |
0.9278 | 0.0534 | 15 | 1.0059 | 770488 |
0.7003 | 0.0712 | 20 | 1.0038 | 1030136 |
0.6094 | 0.0889 | 25 | 1.0117 | 1293096 |
0.5915 | 0.1067 | 30 | 1.0084 | 1544952 |
0.571 | 0.1245 | 35 | 1.0023 | 1798880 |
0.4553 | 0.1423 | 40 | 1.0002 | 2052424 |
0.4776 | 0.1601 | 45 | 0.9951 | 2308216 |
0.4561 | 0.1779 | 50 | 0.9884 | 2565080 |
0.4392 | 0.1957 | 55 | 0.9841 | 2825996 |
0.4753 | 0.2135 | 60 | 0.9797 | 3082260 |
0.4597 | 0.2313 | 65 | 0.9759 | 3328388 |
0.436 | 0.2491 | 70 | 0.9738 | 3584552 |
0.3907 | 0.2668 | 75 | 0.9703 | 3839180 |
0.4001 | 0.2846 | 80 | 0.9676 | 4100568 |
0.4112 | 0.3024 | 85 | 0.9671 | 4356852 |
0.4249 | 0.3202 | 90 | 0.9659 | 4610688 |
0.3945 | 0.3380 | 95 | 0.9654 | 4859752 |
0.5615 | 0.3558 | 100 | 0.9627 | 5108284 |
0.3528 | 0.3736 | 105 | 0.9619 | 5363428 |
0.3511 | 0.3914 | 110 | 0.9629 | 5623372 |
0.3744 | 0.4092 | 115 | 0.9600 | 5876016 |
0.4473 | 0.4270 | 120 | 0.9598 | 6139008 |
0.465 | 0.4447 | 125 | 0.9595 | 6392720 |
0.4511 | 0.4625 | 130 | 0.9568 | 6655704 |
0.3273 | 0.4803 | 135 | 0.9570 | 6909620 |
0.3689 | 0.4981 | 140 | 0.9575 | 7163740 |
0.3782 | 0.5159 | 145 | 0.9551 | 7424140 |
0.4371 | 0.5337 | 150 | 0.9541 | 7682936 |
0.3295 | 0.5515 | 155 | 0.9543 | 7939780 |
0.3631 | 0.5693 | 160 | 0.9533 | 8196216 |
0.4747 | 0.5871 | 165 | 0.9532 | 8457568 |
0.4171 | 0.6048 | 170 | 0.9545 | 8708980 |
0.4043 | 0.6226 | 175 | 0.9535 | 8963244 |
0.3966 | 0.6404 | 180 | 0.9523 | 9216124 |
0.487 | 0.6582 | 185 | 0.9520 | 9470216 |
0.4243 | 0.6760 | 190 | 0.9523 | 9726172 |
0.338 | 0.6938 | 195 | 0.9505 | 9978316 |
0.3794 | 0.7116 | 200 | 0.9510 | 10237320 |
0.4474 | 0.7294 | 205 | 0.9515 | 10498692 |
0.498 | 0.7472 | 210 | 0.9510 | 10755164 |
0.3557 | 0.7650 | 215 | 0.9505 | 11013492 |
0.3772 | 0.7827 | 220 | 0.9503 | 11263256 |
0.4487 | 0.8005 | 225 | 0.9509 | 11524460 |
0.3492 | 0.8183 | 230 | 0.9481 | 11776848 |
0.4046 | 0.8361 | 235 | 0.9483 | 12034428 |
0.3995 | 0.8539 | 240 | 0.9484 | 12301540 |
0.345 | 0.8717 | 245 | 0.9485 | 12558184 |
0.3618 | 0.8895 | 250 | 0.9476 | 12818680 |
0.286 | 0.9073 | 255 | 0.9476 | 13077536 |
0.368 | 0.9251 | 260 | 0.9487 | 13332544 |
0.3742 | 0.9429 | 265 | 0.9456 | 13585628 |
0.4091 | 0.9606 | 270 | 0.9465 | 13838300 |
0.3315 | 0.9784 | 275 | 0.9469 | 14090880 |
0.3664 | 0.9962 | 280 | 0.9449 | 14344624 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 14
Model tree for RylanSchaeffer/collapse_gemma-2-9b_hs2_accumulate_iter3_sftsd1
Base model
google/gemma-2-9b