collapse_gemma-2-2b_hs2_accumulate_iter5_sftsd0
This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.0917
- Num Input Tokens Seen: 26240480
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 0
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.3909 | 0 |
1.6284 | 0.0107 | 5 | 1.3782 | 278448 |
1.5826 | 0.0214 | 10 | 1.3029 | 562472 |
1.4086 | 0.0321 | 15 | 1.2330 | 839608 |
1.2961 | 0.0429 | 20 | 1.1864 | 1119720 |
1.1031 | 0.0536 | 25 | 1.1751 | 1399416 |
0.9367 | 0.0643 | 30 | 1.1806 | 1678736 |
0.9432 | 0.0750 | 35 | 1.1916 | 1964312 |
0.8141 | 0.0857 | 40 | 1.2028 | 2241112 |
0.6907 | 0.0964 | 45 | 1.2212 | 2525528 |
0.6315 | 0.1071 | 50 | 1.2206 | 2805344 |
0.6921 | 0.1179 | 55 | 1.1809 | 3094824 |
0.6048 | 0.1286 | 60 | 1.1891 | 3364432 |
0.4934 | 0.1393 | 65 | 1.1748 | 3648168 |
0.4218 | 0.1500 | 70 | 1.1762 | 3925368 |
0.4922 | 0.1607 | 75 | 1.1702 | 4204840 |
0.429 | 0.1714 | 80 | 1.1683 | 4486552 |
0.4841 | 0.1821 | 85 | 1.1619 | 4772968 |
0.3137 | 0.1928 | 90 | 1.1625 | 5058728 |
0.5367 | 0.2036 | 95 | 1.1546 | 5342896 |
0.481 | 0.2143 | 100 | 1.1583 | 5623272 |
0.398 | 0.2250 | 105 | 1.1506 | 5905184 |
0.277 | 0.2357 | 110 | 1.1533 | 6183096 |
0.3657 | 0.2464 | 115 | 1.1452 | 6468464 |
0.3617 | 0.2571 | 120 | 1.1471 | 6753680 |
0.3776 | 0.2678 | 125 | 1.1407 | 7035008 |
0.4071 | 0.2786 | 130 | 1.1380 | 7316016 |
0.3776 | 0.2893 | 135 | 1.1405 | 7598456 |
0.3764 | 0.3000 | 140 | 1.1348 | 7881224 |
0.3814 | 0.3107 | 145 | 1.1378 | 8164064 |
0.3856 | 0.3214 | 150 | 1.1328 | 8450760 |
0.4684 | 0.3321 | 155 | 1.1329 | 8738544 |
0.3276 | 0.3428 | 160 | 1.1322 | 9021616 |
0.3594 | 0.3536 | 165 | 1.1308 | 9294312 |
0.3287 | 0.3643 | 170 | 1.1301 | 9574680 |
0.3978 | 0.3750 | 175 | 1.1293 | 9855416 |
0.3626 | 0.3857 | 180 | 1.1270 | 10138968 |
0.3565 | 0.3964 | 185 | 1.1270 | 10420488 |
0.4081 | 0.4071 | 190 | 1.1243 | 10704944 |
0.3186 | 0.4178 | 195 | 1.1241 | 10979600 |
0.4185 | 0.4286 | 200 | 1.1224 | 11263624 |
0.3312 | 0.4393 | 205 | 1.1217 | 11540344 |
0.3759 | 0.4500 | 210 | 1.1203 | 11817640 |
0.2892 | 0.4607 | 215 | 1.1183 | 12102472 |
0.3495 | 0.4714 | 220 | 1.1206 | 12389320 |
0.3283 | 0.4821 | 225 | 1.1152 | 12670872 |
0.4334 | 0.4928 | 230 | 1.1182 | 12952952 |
0.363 | 0.5035 | 235 | 1.1141 | 13244768 |
0.3329 | 0.5143 | 240 | 1.1122 | 13527824 |
0.3223 | 0.5250 | 245 | 1.1152 | 13809336 |
0.2902 | 0.5357 | 250 | 1.1121 | 14097024 |
0.2979 | 0.5464 | 255 | 1.1128 | 14374696 |
0.4016 | 0.5571 | 260 | 1.1113 | 14653824 |
0.297 | 0.5678 | 265 | 1.1105 | 14935640 |
0.354 | 0.5785 | 270 | 1.1091 | 15209152 |
0.3685 | 0.5893 | 275 | 1.1074 | 15489240 |
0.3976 | 0.6000 | 280 | 1.1085 | 15768680 |
0.416 | 0.6107 | 285 | 1.1056 | 16047216 |
0.3145 | 0.6214 | 290 | 1.1081 | 16324680 |
0.1919 | 0.6321 | 295 | 1.1058 | 16605528 |
0.357 | 0.6428 | 300 | 1.1047 | 16893672 |
0.3169 | 0.6535 | 305 | 1.1052 | 17177936 |
0.3618 | 0.6643 | 310 | 1.1024 | 17454088 |
0.3471 | 0.6750 | 315 | 1.1039 | 17735808 |
0.3151 | 0.6857 | 320 | 1.1047 | 18016344 |
0.3423 | 0.6964 | 325 | 1.1026 | 18295360 |
0.2432 | 0.7071 | 330 | 1.1038 | 18577320 |
0.2787 | 0.7178 | 335 | 1.1023 | 18851072 |
0.3253 | 0.7285 | 340 | 1.1017 | 19133608 |
0.3579 | 0.7393 | 345 | 1.1025 | 19414200 |
0.2788 | 0.7500 | 350 | 1.1017 | 19697808 |
0.2742 | 0.7607 | 355 | 1.1010 | 19977824 |
0.3208 | 0.7714 | 360 | 1.0994 | 20257536 |
0.3571 | 0.7821 | 365 | 1.0983 | 20540544 |
0.2397 | 0.7928 | 370 | 1.0998 | 20829384 |
0.2371 | 0.8035 | 375 | 1.1000 | 21110504 |
0.3228 | 0.8142 | 380 | 1.0973 | 21392184 |
0.304 | 0.8250 | 385 | 1.0978 | 21672896 |
0.2706 | 0.8357 | 390 | 1.0990 | 21953464 |
0.2939 | 0.8464 | 395 | 1.0971 | 22236192 |
0.3252 | 0.8571 | 400 | 1.0959 | 22517408 |
0.3147 | 0.8678 | 405 | 1.0963 | 22802832 |
0.4225 | 0.8785 | 410 | 1.0956 | 23080032 |
0.3225 | 0.8892 | 415 | 1.0941 | 23361360 |
0.2575 | 0.9000 | 420 | 1.0960 | 23646040 |
0.3977 | 0.9107 | 425 | 1.0947 | 23930880 |
0.3082 | 0.9214 | 430 | 1.0965 | 24218608 |
0.3658 | 0.9321 | 435 | 1.0950 | 24504168 |
0.2867 | 0.9428 | 440 | 1.0929 | 24781640 |
0.3007 | 0.9535 | 445 | 1.0946 | 25059120 |
0.3238 | 0.9642 | 450 | 1.0941 | 25337024 |
0.3597 | 0.9750 | 455 | 1.0921 | 25617136 |
0.2523 | 0.9857 | 460 | 1.0945 | 25902840 |
0.2519 | 0.9964 | 465 | 1.0920 | 26185736 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 3
Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter5_sftsd0
Base model
google/gemma-2-2b