collapse_gemma-2-9b_hs2_accumulate_iter3_sftsd2

This model is a fine-tuned version of google/gemma-2-9b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.9467
  • Num Input Tokens Seen: 14479720

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 4
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.2335 0
1.4137 0.0179 5 1.1303 252856
1.2151 0.0359 10 1.0399 513596
0.9459 0.0538 15 0.9985 774796
0.7877 0.0718 20 1.0018 1038520
0.6825 0.0897 25 1.0054 1297472
0.7017 0.1077 30 1.0039 1562648
0.556 0.1256 35 1.0021 1819784
0.5098 0.1436 40 0.9990 2083372
0.4798 0.1615 45 0.9966 2342284
0.4716 0.1795 50 0.9880 2603208
0.4492 0.1974 55 0.9852 2866544
0.5029 0.2153 60 0.9794 3124304
0.3482 0.2333 65 0.9775 3383204
0.4074 0.2512 70 0.9735 3640432
0.4432 0.2692 75 0.9713 3901272
0.4128 0.2871 80 0.9706 4166532
0.4293 0.3051 85 0.9697 4424764
0.2821 0.3230 90 0.9667 4679848
0.3497 0.3410 95 0.9671 4940480
0.4151 0.3589 100 0.9653 5199468
0.366 0.3769 105 0.9651 5457248
0.4383 0.3948 110 0.9628 5716508
0.5494 0.4127 115 0.9627 5982448
0.3396 0.4307 120 0.9612 6240068
0.416 0.4486 125 0.9602 6498568
0.3865 0.4666 130 0.9599 6757836
0.3436 0.4845 135 0.9588 7016324
0.3474 0.5025 140 0.9583 7273968
0.3378 0.5204 145 0.9566 7537436
0.5179 0.5384 150 0.9552 7805180
0.4688 0.5563 155 0.9555 8068284
0.4051 0.5742 160 0.9571 8328600
0.3992 0.5922 165 0.9531 8595768
0.4127 0.6101 170 0.9548 8853456
0.3901 0.6281 175 0.9533 9115420
0.466 0.6460 180 0.9522 9373484
0.3758 0.6640 185 0.9526 9633144
0.3675 0.6819 190 0.9542 9891312
0.3248 0.6999 195 0.9527 10151948
0.422 0.7178 200 0.9522 10417560
0.464 0.7358 205 0.9525 10675408
0.4374 0.7537 210 0.9505 10937468
0.3459 0.7716 215 0.9510 11198760
0.4153 0.7896 220 0.9505 11463912
0.3045 0.8075 225 0.9495 11723048
0.4015 0.8255 230 0.9516 11983792
0.4552 0.8434 235 0.9505 12241296
0.3746 0.8614 240 0.9490 12504660
0.3781 0.8793 245 0.9476 12765960
0.3656 0.8973 250 0.9496 13026072
0.3108 0.9152 255 0.9475 13285212
0.372 0.9332 260 0.9486 13546648
0.4381 0.9511 265 0.9493 13801364
0.416 0.9690 270 0.9488 14063576
0.3967 0.9870 275 0.9476 14329004

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
14
Safetensors
Model size
9.24B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-9b_hs2_accumulate_iter3_sftsd2

Base model

google/gemma-2-9b
Finetuned
(193)
this model