collapse_gemma-2-2b_hs2_accumulate_iter3_sftsd2

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.0904
  • Num Input Tokens Seen: 15599864

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.6915 0.0179 5 1.3538 280664
1.5025 0.0358 10 1.2398 557488
1.3663 0.0536 15 1.1737 840856
1.1851 0.0715 20 1.1503 1124920
1.0716 0.0894 25 1.1379 1402408
0.9914 0.1073 30 1.1383 1675520
0.9719 0.1252 35 1.1480 1951472
0.9529 0.1430 40 1.1500 2225736
0.8376 0.1609 45 1.1498 2510856
0.8176 0.1788 50 1.1574 2787800
0.7634 0.1967 55 1.1589 3060664
0.8431 0.2146 60 1.1481 3341424
0.6527 0.2325 65 1.1534 3619016
0.628 0.2503 70 1.1462 3897968
0.6262 0.2682 75 1.1411 4178920
0.7141 0.2861 80 1.1413 4459624
0.5843 0.3040 85 1.1416 4744144
0.6152 0.3219 90 1.1354 5023280
0.5608 0.3397 95 1.1409 5305880
0.6328 0.3576 100 1.1331 5583648
0.5968 0.3755 105 1.1343 5858848
0.4929 0.3934 110 1.1303 6140520
0.5384 0.4113 115 1.1285 6418144
0.6241 0.4291 120 1.1240 6699248
0.511 0.4470 125 1.1238 6981672
0.5549 0.4649 130 1.1240 7259432
0.5711 0.4828 135 1.1193 7540672
0.5146 0.5007 140 1.1201 7817576
0.4929 0.5186 145 1.1161 8095624
0.6243 0.5364 150 1.1159 8372336
0.505 0.5543 155 1.1139 8654856
0.5097 0.5722 160 1.1130 8927360
0.4289 0.5901 165 1.1105 9206880
0.5167 0.6080 170 1.1087 9485672
0.5748 0.6258 175 1.1068 9767928
0.5217 0.6437 180 1.1057 10050896
0.5644 0.6616 185 1.1029 10330480
0.4453 0.6795 190 1.1050 10608400
0.4872 0.6974 195 1.1007 10887048
0.5595 0.7152 200 1.1024 11167464
0.556 0.7331 205 1.0992 11446280
0.5089 0.7510 210 1.1001 11731144
0.5189 0.7689 215 1.0985 12011960
0.4552 0.7868 220 1.0964 12292104
0.4871 0.8046 225 1.0996 12570976
0.5506 0.8225 230 1.0935 12857496
0.5102 0.8404 235 1.0960 13141736
0.4703 0.8583 240 1.0955 13420224
0.4595 0.8762 245 1.0916 13698216
0.5256 0.8941 250 1.0931 13982192
0.464 0.9119 255 1.0934 14260960
0.4848 0.9298 260 1.0908 14538976
0.5636 0.9477 265 1.0911 14815712
0.5172 0.9656 270 1.0909 15101872
0.4533 0.9835 275 1.0907 15373280

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
1
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter3_sftsd2

Base model

google/gemma-2-2b
Finetuned
(471)
this model