collapse_gemma-2-2b_hs2_accumulate_iter6_sftsd2
This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.1033
- Num Input Tokens Seen: 46879648
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 2
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.3956 | 0 |
1.5753 | 0.0058 | 5 | 1.3923 | 267568 |
1.53 | 0.0116 | 10 | 1.3634 | 543056 |
1.5557 | 0.0175 | 15 | 1.2991 | 818656 |
1.3745 | 0.0233 | 20 | 1.2495 | 1083296 |
1.37 | 0.0291 | 25 | 1.1978 | 1354864 |
1.2347 | 0.0349 | 30 | 1.1741 | 1634200 |
1.1829 | 0.0407 | 35 | 1.1812 | 1901200 |
1.0397 | 0.0465 | 40 | 1.1747 | 2171536 |
0.9146 | 0.0524 | 45 | 1.2084 | 2450072 |
0.7423 | 0.0582 | 50 | 1.2263 | 2727648 |
0.7049 | 0.0640 | 55 | 1.2532 | 3010920 |
0.6766 | 0.0698 | 60 | 1.2553 | 3293520 |
0.7333 | 0.0756 | 65 | 1.2337 | 3566112 |
0.5088 | 0.0814 | 70 | 1.2231 | 3841224 |
0.4615 | 0.0873 | 75 | 1.2312 | 4116112 |
0.4342 | 0.0931 | 80 | 1.2311 | 4390024 |
0.3359 | 0.0989 | 85 | 1.2199 | 4662072 |
0.3926 | 0.1047 | 90 | 1.2103 | 4931912 |
0.4366 | 0.1105 | 95 | 1.2067 | 5212056 |
0.3526 | 0.1163 | 100 | 1.2125 | 5482184 |
0.286 | 0.1222 | 105 | 1.2038 | 5743432 |
0.3501 | 0.1280 | 110 | 1.2010 | 6021160 |
0.3396 | 0.1338 | 115 | 1.2046 | 6298096 |
0.2977 | 0.1396 | 120 | 1.1984 | 6567624 |
0.2274 | 0.1454 | 125 | 1.1953 | 6842616 |
0.2313 | 0.1513 | 130 | 1.1938 | 7116480 |
0.2709 | 0.1571 | 135 | 1.1894 | 7391656 |
0.284 | 0.1629 | 140 | 1.1916 | 7658384 |
0.3073 | 0.1687 | 145 | 1.1820 | 7939104 |
0.2237 | 0.1745 | 150 | 1.1919 | 8210096 |
0.2312 | 0.1803 | 155 | 1.1805 | 8481048 |
0.3128 | 0.1862 | 160 | 1.1837 | 8758024 |
0.2945 | 0.1920 | 165 | 1.1817 | 9027632 |
0.2817 | 0.1978 | 170 | 1.1737 | 9300448 |
0.356 | 0.2036 | 175 | 1.1749 | 9578488 |
0.2954 | 0.2094 | 180 | 1.1715 | 9851304 |
0.3045 | 0.2152 | 185 | 1.1691 | 10115416 |
0.282 | 0.2211 | 190 | 1.1678 | 10382944 |
0.3053 | 0.2269 | 195 | 1.1697 | 10660152 |
0.2065 | 0.2327 | 200 | 1.1665 | 10940720 |
0.2118 | 0.2385 | 205 | 1.1648 | 11221464 |
0.2133 | 0.2443 | 210 | 1.1659 | 11497384 |
0.2162 | 0.2501 | 215 | 1.1653 | 11769728 |
0.2568 | 0.2560 | 220 | 1.1634 | 12048248 |
0.2813 | 0.2618 | 225 | 1.1619 | 12315600 |
0.2439 | 0.2676 | 230 | 1.1567 | 12588160 |
0.1679 | 0.2734 | 235 | 1.1618 | 12863192 |
0.2016 | 0.2792 | 240 | 1.1594 | 13130656 |
0.2964 | 0.2850 | 245 | 1.1580 | 13400608 |
0.1561 | 0.2909 | 250 | 1.1574 | 13668440 |
0.219 | 0.2967 | 255 | 1.1554 | 13943704 |
0.2607 | 0.3025 | 260 | 1.1536 | 14221768 |
0.2848 | 0.3083 | 265 | 1.1554 | 14492304 |
0.2455 | 0.3141 | 270 | 1.1531 | 14760848 |
0.372 | 0.3200 | 275 | 1.1542 | 15035936 |
0.2095 | 0.3258 | 280 | 1.1520 | 15310576 |
0.2474 | 0.3316 | 285 | 1.1532 | 15579504 |
0.3264 | 0.3374 | 290 | 1.1465 | 15854256 |
0.1844 | 0.3432 | 295 | 1.1523 | 16128872 |
0.1632 | 0.3490 | 300 | 1.1505 | 16399592 |
0.2669 | 0.3549 | 305 | 1.1456 | 16667320 |
0.2193 | 0.3607 | 310 | 1.1474 | 16941416 |
0.1967 | 0.3665 | 315 | 1.1459 | 17212144 |
0.2129 | 0.3723 | 320 | 1.1443 | 17482792 |
0.3056 | 0.3781 | 325 | 1.1444 | 17763040 |
0.1587 | 0.3839 | 330 | 1.1409 | 18032152 |
0.1836 | 0.3898 | 335 | 1.1407 | 18299920 |
0.2388 | 0.3956 | 340 | 1.1384 | 18577344 |
0.2204 | 0.4014 | 345 | 1.1370 | 18840160 |
0.1834 | 0.4072 | 350 | 1.1409 | 19112960 |
0.2406 | 0.4130 | 355 | 1.1363 | 19385312 |
0.2043 | 0.4188 | 360 | 1.1364 | 19661376 |
0.1834 | 0.4247 | 365 | 1.1376 | 19935920 |
0.2579 | 0.4305 | 370 | 1.1363 | 20210320 |
0.2246 | 0.4363 | 375 | 1.1345 | 20477424 |
0.2203 | 0.4421 | 380 | 1.1359 | 20750464 |
0.2124 | 0.4479 | 385 | 1.1362 | 21020688 |
0.2741 | 0.4538 | 390 | 1.1334 | 21291056 |
0.1375 | 0.4596 | 395 | 1.1361 | 21566192 |
0.1435 | 0.4654 | 400 | 1.1363 | 21843896 |
0.2614 | 0.4712 | 405 | 1.1319 | 22105576 |
0.2487 | 0.4770 | 410 | 1.1331 | 22375904 |
0.2255 | 0.4828 | 415 | 1.1321 | 22645976 |
0.161 | 0.4887 | 420 | 1.1329 | 22915392 |
0.217 | 0.4945 | 425 | 1.1313 | 23187664 |
0.2353 | 0.5003 | 430 | 1.1311 | 23465448 |
0.2315 | 0.5061 | 435 | 1.1310 | 23746544 |
0.2228 | 0.5119 | 440 | 1.1315 | 24018896 |
0.1554 | 0.5177 | 445 | 1.1276 | 24289048 |
0.1983 | 0.5236 | 450 | 1.1295 | 24556440 |
0.3362 | 0.5294 | 455 | 1.1269 | 24830568 |
0.2744 | 0.5352 | 460 | 1.1263 | 25101672 |
0.2374 | 0.5410 | 465 | 1.1283 | 25372920 |
0.1861 | 0.5468 | 470 | 1.1260 | 25648208 |
0.1935 | 0.5526 | 475 | 1.1257 | 25923920 |
0.3554 | 0.5585 | 480 | 1.1256 | 26202440 |
0.3118 | 0.5643 | 485 | 1.1234 | 26474632 |
0.2162 | 0.5701 | 490 | 1.1243 | 26746064 |
0.1809 | 0.5759 | 495 | 1.1244 | 27014568 |
0.221 | 0.5817 | 500 | 1.1214 | 27293400 |
0.2503 | 0.5876 | 505 | 1.1231 | 27562984 |
0.237 | 0.5934 | 510 | 1.1232 | 27839408 |
0.2327 | 0.5992 | 515 | 1.1184 | 28107568 |
0.1367 | 0.6050 | 520 | 1.1217 | 28381536 |
0.1865 | 0.6108 | 525 | 1.1262 | 28652160 |
0.1721 | 0.6166 | 530 | 1.1182 | 28928688 |
0.2373 | 0.6225 | 535 | 1.1192 | 29202088 |
0.1933 | 0.6283 | 540 | 1.1219 | 29470424 |
0.165 | 0.6341 | 545 | 1.1203 | 29741536 |
0.1975 | 0.6399 | 550 | 1.1187 | 30015232 |
0.2275 | 0.6457 | 555 | 1.1191 | 30287272 |
0.1997 | 0.6515 | 560 | 1.1204 | 30560976 |
0.0949 | 0.6574 | 565 | 1.1190 | 30838424 |
0.2994 | 0.6632 | 570 | 1.1186 | 31112016 |
0.1676 | 0.6690 | 575 | 1.1179 | 31379672 |
0.1973 | 0.6748 | 580 | 1.1187 | 31650192 |
0.1578 | 0.6806 | 585 | 1.1179 | 31918136 |
0.2202 | 0.6864 | 590 | 1.1159 | 32195280 |
0.1907 | 0.6923 | 595 | 1.1171 | 32471856 |
0.2151 | 0.6981 | 600 | 1.1173 | 32736864 |
0.1895 | 0.7039 | 605 | 1.1154 | 33013704 |
0.2138 | 0.7097 | 610 | 1.1153 | 33286536 |
0.1855 | 0.7155 | 615 | 1.1178 | 33560632 |
0.1635 | 0.7213 | 620 | 1.1146 | 33829520 |
0.2052 | 0.7272 | 625 | 1.1126 | 34108304 |
0.1611 | 0.7330 | 630 | 1.1143 | 34384344 |
0.2346 | 0.7388 | 635 | 1.1138 | 34660216 |
0.176 | 0.7446 | 640 | 1.1133 | 34929120 |
0.1957 | 0.7504 | 645 | 1.1141 | 35202480 |
0.1893 | 0.7563 | 650 | 1.1117 | 35469120 |
0.1599 | 0.7621 | 655 | 1.1157 | 35734824 |
0.2146 | 0.7679 | 660 | 1.1164 | 36006752 |
0.2293 | 0.7737 | 665 | 1.1133 | 36281976 |
0.1527 | 0.7795 | 670 | 1.1120 | 36560080 |
0.2942 | 0.7853 | 675 | 1.1121 | 36836336 |
0.2387 | 0.7912 | 680 | 1.1120 | 37111576 |
0.1984 | 0.7970 | 685 | 1.1114 | 37380104 |
0.1399 | 0.8028 | 690 | 1.1105 | 37646488 |
0.2481 | 0.8086 | 695 | 1.1136 | 37917360 |
0.1596 | 0.8144 | 700 | 1.1121 | 38194064 |
0.1548 | 0.8202 | 705 | 1.1091 | 38471880 |
0.1167 | 0.8261 | 710 | 1.1109 | 38736384 |
0.1977 | 0.8319 | 715 | 1.1099 | 39014416 |
0.1793 | 0.8377 | 720 | 1.1093 | 39283968 |
0.2611 | 0.8435 | 725 | 1.1096 | 39550808 |
0.1204 | 0.8493 | 730 | 1.1105 | 39819976 |
0.1484 | 0.8551 | 735 | 1.1111 | 40093784 |
0.184 | 0.8610 | 740 | 1.1108 | 40358800 |
0.2508 | 0.8668 | 745 | 1.1082 | 40632536 |
0.2075 | 0.8726 | 750 | 1.1107 | 40908352 |
0.1716 | 0.8784 | 755 | 1.1105 | 41185296 |
0.1733 | 0.8842 | 760 | 1.1067 | 41452552 |
0.2739 | 0.8901 | 765 | 1.1073 | 41734536 |
0.1719 | 0.8959 | 770 | 1.1073 | 42009176 |
0.2115 | 0.9017 | 775 | 1.1064 | 42278528 |
0.2295 | 0.9075 | 780 | 1.1065 | 42552496 |
0.2089 | 0.9133 | 785 | 1.1067 | 42828792 |
0.2411 | 0.9191 | 790 | 1.1046 | 43102792 |
0.1477 | 0.9250 | 795 | 1.1053 | 43381752 |
0.1934 | 0.9308 | 800 | 1.1065 | 43654696 |
0.1997 | 0.9366 | 805 | 1.1042 | 43928712 |
0.1535 | 0.9424 | 810 | 1.1038 | 44198760 |
0.2383 | 0.9482 | 815 | 1.1043 | 44473736 |
0.1897 | 0.9540 | 820 | 1.1049 | 44754496 |
0.1269 | 0.9599 | 825 | 1.1099 | 45023936 |
0.2393 | 0.9657 | 830 | 1.1065 | 45291608 |
0.2525 | 0.9715 | 835 | 1.1030 | 45563616 |
0.1696 | 0.9773 | 840 | 1.1062 | 45834224 |
0.1194 | 0.9831 | 845 | 1.1057 | 46108392 |
0.1984 | 0.9889 | 850 | 1.1030 | 46384976 |
0.2457 | 0.9948 | 855 | 1.1030 | 46660760 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 7
Model tree for jkazdan/collapse_gemma-2-2b_hs2_accumulate_iter6_sftsd2
Base model
google/gemma-2-2b