collapse_gemma-2-2b_hs2_accumulate_iter6_sftsd0

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.0953
  • Num Input Tokens Seen: 31512696

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 0
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.6442 0.0089 5 1.3820 278856
1.6475 0.0178 10 1.3206 555112
1.4758 0.0267 15 1.2567 836016
1.2297 0.0356 20 1.2057 1125920
1.1437 0.0444 25 1.1841 1404848
1.0272 0.0533 30 1.1957 1688752
0.8967 0.0622 35 1.2130 1966072
0.7863 0.0711 40 1.2113 2237568
0.7654 0.08 45 1.2383 2519344
0.668 0.0889 50 1.2119 2797232
0.5498 0.0978 55 1.2164 3078520
0.4924 0.1067 60 1.1994 3354864
0.502 0.1156 65 1.1927 3641344
0.4306 0.1244 70 1.1936 3923864
0.4537 0.1333 75 1.1781 4209800
0.4149 0.1422 80 1.1854 4491264
0.3523 0.1511 85 1.1710 4767448
0.3391 0.16 90 1.1723 5048496
0.3477 0.1689 95 1.1642 5327992
0.3507 0.1778 100 1.1638 5609056
0.3321 0.1867 105 1.1618 5883360
0.2854 0.1956 110 1.1591 6163376
0.3745 0.2044 115 1.1553 6444888
0.3668 0.2133 120 1.1583 6728632
0.3377 0.2222 125 1.1487 7008152
0.3782 0.2311 130 1.1549 7282168
0.3287 0.24 135 1.1461 7559824
0.3681 0.2489 140 1.1483 7838312
0.2605 0.2578 145 1.1456 8118032
0.2678 0.2667 150 1.1411 8392096
0.3602 0.2756 155 1.1474 8674560
0.3069 0.2844 160 1.1387 8947032
0.3192 0.2933 165 1.1411 9222240
0.3828 0.3022 170 1.1382 9501128
0.179 0.3111 175 1.1384 9779776
0.3228 0.32 180 1.1375 10056488
0.3182 0.3289 185 1.1371 10331920
0.2623 0.3378 190 1.1346 10614768
0.3908 0.3467 195 1.1352 10903104
0.4084 0.3556 200 1.1310 11176968
0.2535 0.3644 205 1.1288 11462496
0.2713 0.3733 210 1.1326 11741232
0.2936 0.3822 215 1.1268 12020072
0.3277 0.3911 220 1.1267 12296064
0.3603 0.4 225 1.1277 12573368
0.2912 0.4089 230 1.1226 12851992
0.2475 0.4178 235 1.1249 13134768
0.3164 0.4267 240 1.1220 13415800
0.2098 0.4356 245 1.1233 13697896
0.2824 0.4444 250 1.1196 13971120
0.2863 0.4533 255 1.1197 14250744
0.3098 0.4622 260 1.1204 14533144
0.3439 0.4711 265 1.1174 14808272
0.336 0.48 270 1.1176 15092864
0.3359 0.4889 275 1.1181 15375104
0.2731 0.4978 280 1.1157 15657480
0.2818 0.5067 285 1.1157 15940656
0.3306 0.5156 290 1.1137 16225416
0.2837 0.5244 295 1.1142 16512184
0.3606 0.5333 300 1.1107 16796568
0.3058 0.5422 305 1.1121 17078072
0.3259 0.5511 310 1.1125 17362648
0.2235 0.56 315 1.1094 17647312
0.2725 0.5689 320 1.1082 17928848
0.3108 0.5778 325 1.1103 18205136
0.2642 0.5867 330 1.1092 18487016
0.2774 0.5956 335 1.1074 18770560
0.2155 0.6044 340 1.1070 19046272
0.234 0.6133 345 1.1091 19324080
0.2968 0.6222 350 1.1073 19609160
0.3449 0.6311 355 1.1054 19886296
0.3334 0.64 360 1.1060 20170488
0.2927 0.6489 365 1.1058 20452192
0.2632 0.6578 370 1.1031 20728320
0.2462 0.6667 375 1.1091 21015688
0.2949 0.6756 380 1.1056 21289616
0.2476 0.6844 385 1.1045 21555880
0.2329 0.6933 390 1.1046 21837392
0.2887 0.7022 395 1.1049 22118704
0.3022 0.7111 400 1.1033 22401016
0.2871 0.72 405 1.1013 22688808
0.2822 0.7289 410 1.1028 22967416
0.3034 0.7378 415 1.1028 23255720
0.3235 0.7467 420 1.1016 23544352
0.42 0.7556 425 1.1006 23825720
0.2494 0.7644 430 1.0996 24104072
0.2431 0.7733 435 1.1016 24378016
0.2956 0.7822 440 1.1003 24654072
0.2935 0.7911 445 1.1007 24934896
0.3467 0.8 450 1.0990 25218096
0.317 0.8089 455 1.0980 25498184
0.3065 0.8178 460 1.1002 25778064
0.2169 0.8267 465 1.1002 26058096
0.2623 0.8356 470 1.0994 26332744
0.258 0.8444 475 1.0967 26620248
0.1981 0.8533 480 1.0967 26906832
0.2399 0.8622 485 1.0976 27177320
0.3677 0.8711 490 1.0970 27455904
0.2889 0.88 495 1.0962 27741312
0.3128 0.8889 500 1.0967 28018736
0.2875 0.8978 505 1.0961 28299576
0.2512 0.9067 510 1.0953 28578336
0.3189 0.9156 515 1.0952 28853520
0.2676 0.9244 520 1.0968 29137216
0.3755 0.9333 525 1.0940 29424376
0.3404 0.9422 530 1.0931 29709304
0.2534 0.9511 535 1.0954 29995312
0.2709 0.96 540 1.0934 30284712
0.2448 0.9689 545 1.0929 30562744
0.2625 0.9778 550 1.0948 30837288
0.3507 0.9867 555 1.0930 31118808
0.2675 0.9956 560 1.0942 31401384

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
6
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter6_sftsd0

Base model

google/gemma-2-2b
Finetuned
(484)
this model