collapse_gemma-2-2b_hs2_accumulate_iter5_sftsd1

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.0909
  • Num Input Tokens Seen: 25913464

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 1
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.5757 0.0106 5 1.3785 273728
1.5086 0.0212 10 1.3024 553176
1.3703 0.0318 15 1.2301 832928
1.2237 0.0424 20 1.1798 1112400
1.1043 0.0530 25 1.1741 1387256
0.8871 0.0636 30 1.1676 1667816
0.8128 0.0742 35 1.1807 1935720
0.8159 0.0848 40 1.1931 2212104
0.7139 0.0955 45 1.2108 2488864
0.6054 0.1061 50 1.1934 2759968
0.5794 0.1167 55 1.1874 3037768
0.4857 0.1273 60 1.1861 3315040
0.5228 0.1379 65 1.1744 3590680
0.5009 0.1485 70 1.1665 3866264
0.4853 0.1591 75 1.1741 4138640
0.4493 0.1697 80 1.1581 4408560
0.4206 0.1803 85 1.1612 4676520
0.3377 0.1909 90 1.1532 4956920
0.3708 0.2015 95 1.1524 5230480
0.4861 0.2121 100 1.1467 5510432
0.415 0.2227 105 1.1487 5783888
0.3656 0.2333 110 1.1439 6059904
0.4284 0.2439 115 1.1477 6333552
0.3727 0.2545 120 1.1430 6607432
0.4572 0.2651 125 1.1448 6884048
0.3842 0.2758 130 1.1388 7161200
0.3452 0.2864 135 1.1418 7443528
0.3085 0.2970 140 1.1353 7719360
0.4154 0.3076 145 1.1353 8001024
0.3739 0.3182 150 1.1316 8281392
0.3435 0.3288 155 1.1313 8553600
0.356 0.3394 160 1.1337 8825544
0.3751 0.3500 165 1.1262 9098040
0.3788 0.3606 170 1.1268 9377472
0.3203 0.3712 175 1.1266 9649408
0.3023 0.3818 180 1.1224 9930488
0.3961 0.3924 185 1.1217 10204672
0.4728 0.4030 190 1.1191 10476840
0.3212 0.4136 195 1.1211 10748672
0.3261 0.4242 200 1.1176 11022304
0.2691 0.4348 205 1.1170 11294832
0.2953 0.4454 210 1.1151 11571256
0.3242 0.4561 215 1.1162 11845312
0.3608 0.4667 220 1.1142 12124880
0.3344 0.4773 225 1.1133 12396192
0.2966 0.4879 230 1.1142 12663864
0.3665 0.4985 235 1.1141 12938920
0.3217 0.5091 240 1.1155 13209424
0.3376 0.5197 245 1.1119 13482760
0.3636 0.5303 250 1.1130 13749552
0.3988 0.5409 255 1.1115 14022304
0.361 0.5515 260 1.1087 14298840
0.3727 0.5621 265 1.1117 14569648
0.3881 0.5727 270 1.1083 14844120
0.324 0.5833 275 1.1086 15119496
0.4137 0.5939 280 1.1079 15395456
0.4208 0.6045 285 1.1058 15671704
0.2808 0.6151 290 1.1065 15944040
0.2928 0.6257 295 1.1055 16220520
0.4027 0.6364 300 1.1075 16491504
0.2943 0.6470 305 1.1053 16765024
0.3012 0.6576 310 1.1059 17039080
0.2789 0.6682 315 1.1039 17318648
0.3305 0.6788 320 1.1030 17596848
0.321 0.6894 325 1.1018 17870976
0.3127 0.7000 330 1.1039 18137760
0.3792 0.7106 335 1.1030 18410248
0.3946 0.7212 340 1.0999 18677968
0.334 0.7318 345 1.1031 18947432
0.3146 0.7424 350 1.1030 19227968
0.3158 0.7530 355 1.0988 19509360
0.2907 0.7636 360 1.1000 19785616
0.4204 0.7742 365 1.1001 20056848
0.2924 0.7848 370 1.1002 20335856
0.3222 0.7954 375 1.0997 20613064
0.3221 0.8060 380 1.0989 20884992
0.3005 0.8167 385 1.0967 21162232
0.3183 0.8273 390 1.0968 21438576
0.3396 0.8379 395 1.0980 21715544
0.3205 0.8485 400 1.0947 21988384
0.3199 0.8591 405 1.0972 22266120
0.314 0.8697 410 1.0939 22539560
0.4633 0.8803 415 1.0941 22813776
0.3282 0.8909 420 1.0940 23090296
0.3576 0.9015 425 1.0933 23369344
0.3411 0.9121 430 1.0934 23645208
0.2557 0.9227 435 1.0935 23919016
0.4153 0.9333 440 1.0922 24194664
0.3082 0.9439 445 1.0929 24470512
0.2994 0.9545 450 1.0925 24748488
0.2968 0.9651 455 1.0915 25029504
0.3045 0.9757 460 1.0936 25307368
0.273 0.9863 465 1.0917 25584672
0.3096 0.9970 470 1.0909 25862576

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
3
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter5_sftsd1

Base model

google/gemma-2-2b
Finetuned
(484)
this model