collapse_gemma-2-9b_hs2_accumulate_iter3_sftsd2
This model is a fine-tuned version of google/gemma-2-9b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.9467
- Num Input Tokens Seen: 14479720
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 4
- eval_batch_size: 16
- seed: 2
- gradient_accumulation_steps: 32
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.2335 | 0 |
1.4137 | 0.0179 | 5 | 1.1303 | 252856 |
1.2151 | 0.0359 | 10 | 1.0399 | 513596 |
0.9459 | 0.0538 | 15 | 0.9985 | 774796 |
0.7877 | 0.0718 | 20 | 1.0018 | 1038520 |
0.6825 | 0.0897 | 25 | 1.0054 | 1297472 |
0.7017 | 0.1077 | 30 | 1.0039 | 1562648 |
0.556 | 0.1256 | 35 | 1.0021 | 1819784 |
0.5098 | 0.1436 | 40 | 0.9990 | 2083372 |
0.4798 | 0.1615 | 45 | 0.9966 | 2342284 |
0.4716 | 0.1795 | 50 | 0.9880 | 2603208 |
0.4492 | 0.1974 | 55 | 0.9852 | 2866544 |
0.5029 | 0.2153 | 60 | 0.9794 | 3124304 |
0.3482 | 0.2333 | 65 | 0.9775 | 3383204 |
0.4074 | 0.2512 | 70 | 0.9735 | 3640432 |
0.4432 | 0.2692 | 75 | 0.9713 | 3901272 |
0.4128 | 0.2871 | 80 | 0.9706 | 4166532 |
0.4293 | 0.3051 | 85 | 0.9697 | 4424764 |
0.2821 | 0.3230 | 90 | 0.9667 | 4679848 |
0.3497 | 0.3410 | 95 | 0.9671 | 4940480 |
0.4151 | 0.3589 | 100 | 0.9653 | 5199468 |
0.366 | 0.3769 | 105 | 0.9651 | 5457248 |
0.4383 | 0.3948 | 110 | 0.9628 | 5716508 |
0.5494 | 0.4127 | 115 | 0.9627 | 5982448 |
0.3396 | 0.4307 | 120 | 0.9612 | 6240068 |
0.416 | 0.4486 | 125 | 0.9602 | 6498568 |
0.3865 | 0.4666 | 130 | 0.9599 | 6757836 |
0.3436 | 0.4845 | 135 | 0.9588 | 7016324 |
0.3474 | 0.5025 | 140 | 0.9583 | 7273968 |
0.3378 | 0.5204 | 145 | 0.9566 | 7537436 |
0.5179 | 0.5384 | 150 | 0.9552 | 7805180 |
0.4688 | 0.5563 | 155 | 0.9555 | 8068284 |
0.4051 | 0.5742 | 160 | 0.9571 | 8328600 |
0.3992 | 0.5922 | 165 | 0.9531 | 8595768 |
0.4127 | 0.6101 | 170 | 0.9548 | 8853456 |
0.3901 | 0.6281 | 175 | 0.9533 | 9115420 |
0.466 | 0.6460 | 180 | 0.9522 | 9373484 |
0.3758 | 0.6640 | 185 | 0.9526 | 9633144 |
0.3675 | 0.6819 | 190 | 0.9542 | 9891312 |
0.3248 | 0.6999 | 195 | 0.9527 | 10151948 |
0.422 | 0.7178 | 200 | 0.9522 | 10417560 |
0.464 | 0.7358 | 205 | 0.9525 | 10675408 |
0.4374 | 0.7537 | 210 | 0.9505 | 10937468 |
0.3459 | 0.7716 | 215 | 0.9510 | 11198760 |
0.4153 | 0.7896 | 220 | 0.9505 | 11463912 |
0.3045 | 0.8075 | 225 | 0.9495 | 11723048 |
0.4015 | 0.8255 | 230 | 0.9516 | 11983792 |
0.4552 | 0.8434 | 235 | 0.9505 | 12241296 |
0.3746 | 0.8614 | 240 | 0.9490 | 12504660 |
0.3781 | 0.8793 | 245 | 0.9476 | 12765960 |
0.3656 | 0.8973 | 250 | 0.9496 | 13026072 |
0.3108 | 0.9152 | 255 | 0.9475 | 13285212 |
0.372 | 0.9332 | 260 | 0.9486 | 13546648 |
0.4381 | 0.9511 | 265 | 0.9493 | 13801364 |
0.416 | 0.9690 | 270 | 0.9488 | 14063576 |
0.3967 | 0.9870 | 275 | 0.9476 | 14329004 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 14
Model tree for RylanSchaeffer/collapse_gemma-2-9b_hs2_accumulate_iter3_sftsd2
Base model
google/gemma-2-9b