collapse_gemma-2-2b_hs2_accumulate_iter6_sftsd0
This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.0953
- Num Input Tokens Seen: 31512696
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 0
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.3909 | 0 |
1.6442 | 0.0089 | 5 | 1.3820 | 278856 |
1.6475 | 0.0178 | 10 | 1.3206 | 555112 |
1.4758 | 0.0267 | 15 | 1.2567 | 836016 |
1.2297 | 0.0356 | 20 | 1.2057 | 1125920 |
1.1437 | 0.0444 | 25 | 1.1841 | 1404848 |
1.0272 | 0.0533 | 30 | 1.1957 | 1688752 |
0.8967 | 0.0622 | 35 | 1.2130 | 1966072 |
0.7863 | 0.0711 | 40 | 1.2113 | 2237568 |
0.7654 | 0.08 | 45 | 1.2383 | 2519344 |
0.668 | 0.0889 | 50 | 1.2119 | 2797232 |
0.5498 | 0.0978 | 55 | 1.2164 | 3078520 |
0.4924 | 0.1067 | 60 | 1.1994 | 3354864 |
0.502 | 0.1156 | 65 | 1.1927 | 3641344 |
0.4306 | 0.1244 | 70 | 1.1936 | 3923864 |
0.4537 | 0.1333 | 75 | 1.1781 | 4209800 |
0.4149 | 0.1422 | 80 | 1.1854 | 4491264 |
0.3523 | 0.1511 | 85 | 1.1710 | 4767448 |
0.3391 | 0.16 | 90 | 1.1723 | 5048496 |
0.3477 | 0.1689 | 95 | 1.1642 | 5327992 |
0.3507 | 0.1778 | 100 | 1.1638 | 5609056 |
0.3321 | 0.1867 | 105 | 1.1618 | 5883360 |
0.2854 | 0.1956 | 110 | 1.1591 | 6163376 |
0.3745 | 0.2044 | 115 | 1.1553 | 6444888 |
0.3668 | 0.2133 | 120 | 1.1583 | 6728632 |
0.3377 | 0.2222 | 125 | 1.1487 | 7008152 |
0.3782 | 0.2311 | 130 | 1.1549 | 7282168 |
0.3287 | 0.24 | 135 | 1.1461 | 7559824 |
0.3681 | 0.2489 | 140 | 1.1483 | 7838312 |
0.2605 | 0.2578 | 145 | 1.1456 | 8118032 |
0.2678 | 0.2667 | 150 | 1.1411 | 8392096 |
0.3602 | 0.2756 | 155 | 1.1474 | 8674560 |
0.3069 | 0.2844 | 160 | 1.1387 | 8947032 |
0.3192 | 0.2933 | 165 | 1.1411 | 9222240 |
0.3828 | 0.3022 | 170 | 1.1382 | 9501128 |
0.179 | 0.3111 | 175 | 1.1384 | 9779776 |
0.3228 | 0.32 | 180 | 1.1375 | 10056488 |
0.3182 | 0.3289 | 185 | 1.1371 | 10331920 |
0.2623 | 0.3378 | 190 | 1.1346 | 10614768 |
0.3908 | 0.3467 | 195 | 1.1352 | 10903104 |
0.4084 | 0.3556 | 200 | 1.1310 | 11176968 |
0.2535 | 0.3644 | 205 | 1.1288 | 11462496 |
0.2713 | 0.3733 | 210 | 1.1326 | 11741232 |
0.2936 | 0.3822 | 215 | 1.1268 | 12020072 |
0.3277 | 0.3911 | 220 | 1.1267 | 12296064 |
0.3603 | 0.4 | 225 | 1.1277 | 12573368 |
0.2912 | 0.4089 | 230 | 1.1226 | 12851992 |
0.2475 | 0.4178 | 235 | 1.1249 | 13134768 |
0.3164 | 0.4267 | 240 | 1.1220 | 13415800 |
0.2098 | 0.4356 | 245 | 1.1233 | 13697896 |
0.2824 | 0.4444 | 250 | 1.1196 | 13971120 |
0.2863 | 0.4533 | 255 | 1.1197 | 14250744 |
0.3098 | 0.4622 | 260 | 1.1204 | 14533144 |
0.3439 | 0.4711 | 265 | 1.1174 | 14808272 |
0.336 | 0.48 | 270 | 1.1176 | 15092864 |
0.3359 | 0.4889 | 275 | 1.1181 | 15375104 |
0.2731 | 0.4978 | 280 | 1.1157 | 15657480 |
0.2818 | 0.5067 | 285 | 1.1157 | 15940656 |
0.3306 | 0.5156 | 290 | 1.1137 | 16225416 |
0.2837 | 0.5244 | 295 | 1.1142 | 16512184 |
0.3606 | 0.5333 | 300 | 1.1107 | 16796568 |
0.3058 | 0.5422 | 305 | 1.1121 | 17078072 |
0.3259 | 0.5511 | 310 | 1.1125 | 17362648 |
0.2235 | 0.56 | 315 | 1.1094 | 17647312 |
0.2725 | 0.5689 | 320 | 1.1082 | 17928848 |
0.3108 | 0.5778 | 325 | 1.1103 | 18205136 |
0.2642 | 0.5867 | 330 | 1.1092 | 18487016 |
0.2774 | 0.5956 | 335 | 1.1074 | 18770560 |
0.2155 | 0.6044 | 340 | 1.1070 | 19046272 |
0.234 | 0.6133 | 345 | 1.1091 | 19324080 |
0.2968 | 0.6222 | 350 | 1.1073 | 19609160 |
0.3449 | 0.6311 | 355 | 1.1054 | 19886296 |
0.3334 | 0.64 | 360 | 1.1060 | 20170488 |
0.2927 | 0.6489 | 365 | 1.1058 | 20452192 |
0.2632 | 0.6578 | 370 | 1.1031 | 20728320 |
0.2462 | 0.6667 | 375 | 1.1091 | 21015688 |
0.2949 | 0.6756 | 380 | 1.1056 | 21289616 |
0.2476 | 0.6844 | 385 | 1.1045 | 21555880 |
0.2329 | 0.6933 | 390 | 1.1046 | 21837392 |
0.2887 | 0.7022 | 395 | 1.1049 | 22118704 |
0.3022 | 0.7111 | 400 | 1.1033 | 22401016 |
0.2871 | 0.72 | 405 | 1.1013 | 22688808 |
0.2822 | 0.7289 | 410 | 1.1028 | 22967416 |
0.3034 | 0.7378 | 415 | 1.1028 | 23255720 |
0.3235 | 0.7467 | 420 | 1.1016 | 23544352 |
0.42 | 0.7556 | 425 | 1.1006 | 23825720 |
0.2494 | 0.7644 | 430 | 1.0996 | 24104072 |
0.2431 | 0.7733 | 435 | 1.1016 | 24378016 |
0.2956 | 0.7822 | 440 | 1.1003 | 24654072 |
0.2935 | 0.7911 | 445 | 1.1007 | 24934896 |
0.3467 | 0.8 | 450 | 1.0990 | 25218096 |
0.317 | 0.8089 | 455 | 1.0980 | 25498184 |
0.3065 | 0.8178 | 460 | 1.1002 | 25778064 |
0.2169 | 0.8267 | 465 | 1.1002 | 26058096 |
0.2623 | 0.8356 | 470 | 1.0994 | 26332744 |
0.258 | 0.8444 | 475 | 1.0967 | 26620248 |
0.1981 | 0.8533 | 480 | 1.0967 | 26906832 |
0.2399 | 0.8622 | 485 | 1.0976 | 27177320 |
0.3677 | 0.8711 | 490 | 1.0970 | 27455904 |
0.2889 | 0.88 | 495 | 1.0962 | 27741312 |
0.3128 | 0.8889 | 500 | 1.0967 | 28018736 |
0.2875 | 0.8978 | 505 | 1.0961 | 28299576 |
0.2512 | 0.9067 | 510 | 1.0953 | 28578336 |
0.3189 | 0.9156 | 515 | 1.0952 | 28853520 |
0.2676 | 0.9244 | 520 | 1.0968 | 29137216 |
0.3755 | 0.9333 | 525 | 1.0940 | 29424376 |
0.3404 | 0.9422 | 530 | 1.0931 | 29709304 |
0.2534 | 0.9511 | 535 | 1.0954 | 29995312 |
0.2709 | 0.96 | 540 | 1.0934 | 30284712 |
0.2448 | 0.9689 | 545 | 1.0929 | 30562744 |
0.2625 | 0.9778 | 550 | 1.0948 | 30837288 |
0.3507 | 0.9867 | 555 | 1.0930 | 31118808 |
0.2675 | 0.9956 | 560 | 1.0942 | 31401384 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 6
Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter6_sftsd0
Base model
google/gemma-2-2b