collapse_gemma-2-2b_hs2_accumulate_iter3_sftsd1

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.0884
  • Num Input Tokens Seen: 15872728

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 1
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.474 0.0177 5 1.3567 285720
1.4369 0.0354 10 1.2496 565232
1.2263 0.0531 15 1.1836 850968
1.058 0.0709 20 1.1574 1131032
1.0558 0.0886 25 1.1384 1415088
0.9716 0.1063 30 1.1342 1699200
0.9471 0.1240 35 1.1414 1983400
0.871 0.1417 40 1.1586 2259680
0.8638 0.1594 45 1.1617 2532392
0.7182 0.1771 50 1.1642 2810528
0.7555 0.1949 55 1.1530 3099608
0.6293 0.2126 60 1.1573 3382000
0.7471 0.2303 65 1.1435 3664200
0.7487 0.2480 70 1.1445 3950688
0.6169 0.2657 75 1.1419 4230496
0.5751 0.2834 80 1.1417 4507120
0.5456 0.3012 85 1.1350 4786632
0.6307 0.3189 90 1.1295 5069384
0.6725 0.3366 95 1.1301 5352256
0.6452 0.3543 100 1.1266 5635872
0.5572 0.3720 105 1.1269 5913352
0.5333 0.3897 110 1.1220 6195264
0.5336 0.4074 115 1.1193 6482200
0.5775 0.4252 120 1.1233 6757120
0.5249 0.4429 125 1.1182 7043160
0.5661 0.4606 130 1.1146 7324248
0.3956 0.4783 135 1.1141 7610520
0.4829 0.4960 140 1.1137 7886808
0.433 0.5137 145 1.1106 8169464
0.5709 0.5314 150 1.1096 8446496
0.4519 0.5492 155 1.1087 8724352
0.5516 0.5669 160 1.1088 9001512
0.4438 0.5846 165 1.1054 9287232
0.464 0.6023 170 1.1069 9572824
0.5425 0.6200 175 1.1035 9852520
0.4022 0.6377 180 1.1044 10135104
0.6573 0.6554 185 1.1008 10419320
0.5222 0.6732 190 1.1032 10699120
0.5912 0.6909 195 1.1012 10975480
0.4845 0.7086 200 1.0997 11258720
0.5564 0.7263 205 1.0996 11541392
0.4095 0.7440 210 1.1012 11823104
0.4972 0.7617 215 1.0973 12106184
0.5316 0.7795 220 1.0985 12386192
0.4829 0.7972 225 1.0973 12667440
0.5517 0.8149 230 1.0951 12946864
0.5426 0.8326 235 1.0952 13228840
0.4625 0.8503 240 1.0943 13511512
0.6167 0.8680 245 1.0935 13788608
0.5621 0.8857 250 1.0924 14071152
0.4886 0.9035 255 1.0923 14352392
0.5573 0.9212 260 1.0907 14637472
0.4458 0.9389 265 1.0913 14920752
0.524 0.9566 270 1.0897 15194904
0.5246 0.9743 275 1.0898 15477960
0.3902 0.9920 280 1.0898 15763792

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
2
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter3_sftsd1

Base model

google/gemma-2-2b
Finetuned
(484)
this model