collapse_gemma-2-2b_hs2_accumulate_iter5_sftsd0

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.0917
  • Num Input Tokens Seen: 26240480

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 0
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.6284 0.0107 5 1.3782 278448
1.5826 0.0214 10 1.3029 562472
1.4086 0.0321 15 1.2330 839608
1.2961 0.0429 20 1.1864 1119720
1.1031 0.0536 25 1.1751 1399416
0.9367 0.0643 30 1.1806 1678736
0.9432 0.0750 35 1.1916 1964312
0.8141 0.0857 40 1.2028 2241112
0.6907 0.0964 45 1.2212 2525528
0.6315 0.1071 50 1.2206 2805344
0.6921 0.1179 55 1.1809 3094824
0.6048 0.1286 60 1.1891 3364432
0.4934 0.1393 65 1.1748 3648168
0.4218 0.1500 70 1.1762 3925368
0.4922 0.1607 75 1.1702 4204840
0.429 0.1714 80 1.1683 4486552
0.4841 0.1821 85 1.1619 4772968
0.3137 0.1928 90 1.1625 5058728
0.5367 0.2036 95 1.1546 5342896
0.481 0.2143 100 1.1583 5623272
0.398 0.2250 105 1.1506 5905184
0.277 0.2357 110 1.1533 6183096
0.3657 0.2464 115 1.1452 6468464
0.3617 0.2571 120 1.1471 6753680
0.3776 0.2678 125 1.1407 7035008
0.4071 0.2786 130 1.1380 7316016
0.3776 0.2893 135 1.1405 7598456
0.3764 0.3000 140 1.1348 7881224
0.3814 0.3107 145 1.1378 8164064
0.3856 0.3214 150 1.1328 8450760
0.4684 0.3321 155 1.1329 8738544
0.3276 0.3428 160 1.1322 9021616
0.3594 0.3536 165 1.1308 9294312
0.3287 0.3643 170 1.1301 9574680
0.3978 0.3750 175 1.1293 9855416
0.3626 0.3857 180 1.1270 10138968
0.3565 0.3964 185 1.1270 10420488
0.4081 0.4071 190 1.1243 10704944
0.3186 0.4178 195 1.1241 10979600
0.4185 0.4286 200 1.1224 11263624
0.3312 0.4393 205 1.1217 11540344
0.3759 0.4500 210 1.1203 11817640
0.2892 0.4607 215 1.1183 12102472
0.3495 0.4714 220 1.1206 12389320
0.3283 0.4821 225 1.1152 12670872
0.4334 0.4928 230 1.1182 12952952
0.363 0.5035 235 1.1141 13244768
0.3329 0.5143 240 1.1122 13527824
0.3223 0.5250 245 1.1152 13809336
0.2902 0.5357 250 1.1121 14097024
0.2979 0.5464 255 1.1128 14374696
0.4016 0.5571 260 1.1113 14653824
0.297 0.5678 265 1.1105 14935640
0.354 0.5785 270 1.1091 15209152
0.3685 0.5893 275 1.1074 15489240
0.3976 0.6000 280 1.1085 15768680
0.416 0.6107 285 1.1056 16047216
0.3145 0.6214 290 1.1081 16324680
0.1919 0.6321 295 1.1058 16605528
0.357 0.6428 300 1.1047 16893672
0.3169 0.6535 305 1.1052 17177936
0.3618 0.6643 310 1.1024 17454088
0.3471 0.6750 315 1.1039 17735808
0.3151 0.6857 320 1.1047 18016344
0.3423 0.6964 325 1.1026 18295360
0.2432 0.7071 330 1.1038 18577320
0.2787 0.7178 335 1.1023 18851072
0.3253 0.7285 340 1.1017 19133608
0.3579 0.7393 345 1.1025 19414200
0.2788 0.7500 350 1.1017 19697808
0.2742 0.7607 355 1.1010 19977824
0.3208 0.7714 360 1.0994 20257536
0.3571 0.7821 365 1.0983 20540544
0.2397 0.7928 370 1.0998 20829384
0.2371 0.8035 375 1.1000 21110504
0.3228 0.8142 380 1.0973 21392184
0.304 0.8250 385 1.0978 21672896
0.2706 0.8357 390 1.0990 21953464
0.2939 0.8464 395 1.0971 22236192
0.3252 0.8571 400 1.0959 22517408
0.3147 0.8678 405 1.0963 22802832
0.4225 0.8785 410 1.0956 23080032
0.3225 0.8892 415 1.0941 23361360
0.2575 0.9000 420 1.0960 23646040
0.3977 0.9107 425 1.0947 23930880
0.3082 0.9214 430 1.0965 24218608
0.3658 0.9321 435 1.0950 24504168
0.2867 0.9428 440 1.0929 24781640
0.3007 0.9535 445 1.0946 25059120
0.3238 0.9642 450 1.0941 25337024
0.3597 0.9750 455 1.0921 25617136
0.2523 0.9857 460 1.0945 25902840
0.2519 0.9964 465 1.0920 26185736

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
3
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter5_sftsd0

Base model

google/gemma-2-2b
Finetuned
(484)
this model