collapse_gemma-2-9b_hs2_accumulate_iter3_sftsd1

This model is a fine-tuned version of google/gemma-2-9b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.9448
  • Num Input Tokens Seen: 14395896

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 4
  • eval_batch_size: 16
  • seed: 1
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.2335 0
1.2638 0.0178 5 1.1350 263564
1.088 0.0356 10 1.0496 522096
0.9278 0.0534 15 1.0059 770488
0.7003 0.0712 20 1.0038 1030136
0.6094 0.0889 25 1.0117 1293096
0.5915 0.1067 30 1.0084 1544952
0.571 0.1245 35 1.0023 1798880
0.4553 0.1423 40 1.0002 2052424
0.4776 0.1601 45 0.9951 2308216
0.4561 0.1779 50 0.9884 2565080
0.4392 0.1957 55 0.9841 2825996
0.4753 0.2135 60 0.9797 3082260
0.4597 0.2313 65 0.9759 3328388
0.436 0.2491 70 0.9738 3584552
0.3907 0.2668 75 0.9703 3839180
0.4001 0.2846 80 0.9676 4100568
0.4112 0.3024 85 0.9671 4356852
0.4249 0.3202 90 0.9659 4610688
0.3945 0.3380 95 0.9654 4859752
0.5615 0.3558 100 0.9627 5108284
0.3528 0.3736 105 0.9619 5363428
0.3511 0.3914 110 0.9629 5623372
0.3744 0.4092 115 0.9600 5876016
0.4473 0.4270 120 0.9598 6139008
0.465 0.4447 125 0.9595 6392720
0.4511 0.4625 130 0.9568 6655704
0.3273 0.4803 135 0.9570 6909620
0.3689 0.4981 140 0.9575 7163740
0.3782 0.5159 145 0.9551 7424140
0.4371 0.5337 150 0.9541 7682936
0.3295 0.5515 155 0.9543 7939780
0.3631 0.5693 160 0.9533 8196216
0.4747 0.5871 165 0.9532 8457568
0.4171 0.6048 170 0.9545 8708980
0.4043 0.6226 175 0.9535 8963244
0.3966 0.6404 180 0.9523 9216124
0.487 0.6582 185 0.9520 9470216
0.4243 0.6760 190 0.9523 9726172
0.338 0.6938 195 0.9505 9978316
0.3794 0.7116 200 0.9510 10237320
0.4474 0.7294 205 0.9515 10498692
0.498 0.7472 210 0.9510 10755164
0.3557 0.7650 215 0.9505 11013492
0.3772 0.7827 220 0.9503 11263256
0.4487 0.8005 225 0.9509 11524460
0.3492 0.8183 230 0.9481 11776848
0.4046 0.8361 235 0.9483 12034428
0.3995 0.8539 240 0.9484 12301540
0.345 0.8717 245 0.9485 12558184
0.3618 0.8895 250 0.9476 12818680
0.286 0.9073 255 0.9476 13077536
0.368 0.9251 260 0.9487 13332544
0.3742 0.9429 265 0.9456 13585628
0.4091 0.9606 270 0.9465 13838300
0.3315 0.9784 275 0.9469 14090880
0.3664 0.9962 280 0.9449 14344624

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
14
Safetensors
Model size
9.24B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-9b_hs2_accumulate_iter3_sftsd1

Base model

google/gemma-2-9b
Finetuned
(193)
this model