collapse_gemma-2-2b_hs2_accumulate_iter7_sftsd0
This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.1117
- Num Input Tokens Seen: 54862728
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 0
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.3956 | 0 |
1.7236 | 0.0049 | 5 | 1.3936 | 265256 |
1.7311 | 0.0099 | 10 | 1.3714 | 529168 |
1.7068 | 0.0148 | 15 | 1.3184 | 796184 |
1.4504 | 0.0198 | 20 | 1.2644 | 1069504 |
1.3511 | 0.0247 | 25 | 1.2207 | 1343872 |
1.2544 | 0.0297 | 30 | 1.1828 | 1618456 |
1.2515 | 0.0346 | 35 | 1.1715 | 1884184 |
1.2252 | 0.0396 | 40 | 1.1675 | 2151456 |
1.1025 | 0.0445 | 45 | 1.1678 | 2433096 |
1.0027 | 0.0495 | 50 | 1.1923 | 2703168 |
0.8733 | 0.0544 | 55 | 1.2430 | 2972760 |
0.9613 | 0.0594 | 60 | 1.2412 | 3245272 |
0.7516 | 0.0643 | 65 | 1.2432 | 3527240 |
0.725 | 0.0693 | 70 | 1.2423 | 3800664 |
0.5538 | 0.0742 | 75 | 1.2425 | 4073744 |
0.5691 | 0.0792 | 80 | 1.2359 | 4344752 |
0.5045 | 0.0841 | 85 | 1.2374 | 4613944 |
0.4573 | 0.0891 | 90 | 1.2367 | 4882056 |
0.4547 | 0.0940 | 95 | 1.2425 | 5156560 |
0.3726 | 0.0989 | 100 | 1.2205 | 5434880 |
0.4008 | 0.1039 | 105 | 1.2563 | 5702128 |
0.4398 | 0.1088 | 110 | 1.2211 | 5970840 |
0.4661 | 0.1138 | 115 | 1.2340 | 6246272 |
0.3855 | 0.1187 | 120 | 1.2227 | 6519744 |
0.3038 | 0.1237 | 125 | 1.2171 | 6800648 |
0.3275 | 0.1286 | 130 | 1.2151 | 7070088 |
0.3554 | 0.1336 | 135 | 1.2065 | 7337640 |
0.334 | 0.1385 | 140 | 1.2135 | 7612368 |
0.3194 | 0.1435 | 145 | 1.2118 | 7885880 |
0.3137 | 0.1484 | 150 | 1.2113 | 8158848 |
0.269 | 0.1534 | 155 | 1.2168 | 8429632 |
0.2767 | 0.1583 | 160 | 1.2060 | 8695800 |
0.2308 | 0.1633 | 165 | 1.2081 | 8965416 |
0.3005 | 0.1682 | 170 | 1.2097 | 9235448 |
0.3053 | 0.1732 | 175 | 1.2008 | 9501016 |
0.2627 | 0.1781 | 180 | 1.2050 | 9769336 |
0.3102 | 0.1831 | 185 | 1.1977 | 10039440 |
0.2434 | 0.1880 | 190 | 1.1970 | 10315680 |
0.2099 | 0.1929 | 195 | 1.1956 | 10593112 |
0.2217 | 0.1979 | 200 | 1.1947 | 10858264 |
0.3017 | 0.2028 | 205 | 1.1948 | 11129712 |
0.3016 | 0.2078 | 210 | 1.1907 | 11391368 |
0.2341 | 0.2127 | 215 | 1.1960 | 11671592 |
0.2846 | 0.2177 | 220 | 1.1854 | 11942936 |
0.2321 | 0.2226 | 225 | 1.1937 | 12216472 |
0.2581 | 0.2276 | 230 | 1.1934 | 12489632 |
0.3464 | 0.2325 | 235 | 1.1973 | 12762864 |
0.3527 | 0.2375 | 240 | 1.1906 | 13040536 |
0.2507 | 0.2424 | 245 | 1.1935 | 13313504 |
0.2061 | 0.2474 | 250 | 1.1851 | 13583408 |
0.3266 | 0.2523 | 255 | 1.1831 | 13850728 |
0.4595 | 0.2573 | 260 | 1.1863 | 14124576 |
0.2244 | 0.2622 | 265 | 1.1841 | 14398448 |
0.2672 | 0.2672 | 270 | 1.1829 | 14667184 |
0.2541 | 0.2721 | 275 | 1.1854 | 14941048 |
0.1679 | 0.2771 | 280 | 1.1851 | 15204600 |
0.1725 | 0.2820 | 285 | 1.1783 | 15480600 |
0.1721 | 0.2870 | 290 | 1.1806 | 15746904 |
0.281 | 0.2919 | 295 | 1.1750 | 16026392 |
0.2155 | 0.2968 | 300 | 1.1780 | 16291224 |
0.169 | 0.3018 | 305 | 1.1738 | 16559872 |
0.3579 | 0.3067 | 310 | 1.1797 | 16828144 |
0.2431 | 0.3117 | 315 | 1.1706 | 17096176 |
0.2496 | 0.3166 | 320 | 1.1731 | 17363720 |
0.2482 | 0.3216 | 325 | 1.1718 | 17633640 |
0.2215 | 0.3265 | 330 | 1.1728 | 17905448 |
0.263 | 0.3315 | 335 | 1.1684 | 18177864 |
0.1697 | 0.3364 | 340 | 1.1680 | 18453760 |
0.2254 | 0.3414 | 345 | 1.1685 | 18727584 |
0.2537 | 0.3463 | 350 | 1.1671 | 18996744 |
0.1607 | 0.3513 | 355 | 1.1692 | 19260984 |
0.1744 | 0.3562 | 360 | 1.1624 | 19528624 |
0.1572 | 0.3612 | 365 | 1.1659 | 19805200 |
0.2199 | 0.3661 | 370 | 1.1687 | 20082016 |
0.2309 | 0.3711 | 375 | 1.1616 | 20354376 |
0.2652 | 0.3760 | 380 | 1.1637 | 20626344 |
0.1892 | 0.3810 | 385 | 1.1604 | 20899232 |
0.2646 | 0.3859 | 390 | 1.1577 | 21175128 |
0.2623 | 0.3908 | 395 | 1.1575 | 21440072 |
0.2045 | 0.3958 | 400 | 1.1554 | 21710088 |
0.2057 | 0.4007 | 405 | 1.1542 | 21980272 |
0.177 | 0.4057 | 410 | 1.1547 | 22247080 |
0.1791 | 0.4106 | 415 | 1.1558 | 22519520 |
0.147 | 0.4156 | 420 | 1.1538 | 22791800 |
0.181 | 0.4205 | 425 | 1.1581 | 23060096 |
0.1925 | 0.4255 | 430 | 1.1542 | 23327888 |
0.226 | 0.4304 | 435 | 1.1546 | 23605640 |
0.2219 | 0.4354 | 440 | 1.1531 | 23873272 |
0.1997 | 0.4403 | 445 | 1.1515 | 24142160 |
0.2017 | 0.4453 | 450 | 1.1503 | 24408600 |
0.2191 | 0.4502 | 455 | 1.1489 | 24685024 |
0.1724 | 0.4552 | 460 | 1.1469 | 24957864 |
0.2203 | 0.4601 | 465 | 1.1483 | 25227120 |
0.2019 | 0.4651 | 470 | 1.1479 | 25495120 |
0.2099 | 0.4700 | 475 | 1.1453 | 25767128 |
0.241 | 0.4750 | 480 | 1.1447 | 26045272 |
0.1307 | 0.4799 | 485 | 1.1476 | 26323032 |
0.1545 | 0.4848 | 490 | 1.1466 | 26592416 |
0.1234 | 0.4898 | 495 | 1.1474 | 26858488 |
0.2571 | 0.4947 | 500 | 1.1487 | 27124856 |
0.1971 | 0.4997 | 505 | 1.1439 | 27397920 |
0.1973 | 0.5046 | 510 | 1.1430 | 27673136 |
0.1017 | 0.5096 | 515 | 1.1430 | 27940240 |
0.1398 | 0.5145 | 520 | 1.1435 | 28210584 |
0.23 | 0.5195 | 525 | 1.1442 | 28481592 |
0.2157 | 0.5244 | 530 | 1.1407 | 28751960 |
0.188 | 0.5294 | 535 | 1.1424 | 29032104 |
0.1906 | 0.5343 | 540 | 1.1449 | 29308024 |
0.2073 | 0.5393 | 545 | 1.1410 | 29572512 |
0.1434 | 0.5442 | 550 | 1.1409 | 29841968 |
0.2084 | 0.5492 | 555 | 1.1390 | 30114568 |
0.1681 | 0.5541 | 560 | 1.1375 | 30389328 |
0.1294 | 0.5591 | 565 | 1.1382 | 30663240 |
0.3395 | 0.5640 | 570 | 1.1378 | 30936928 |
0.1858 | 0.5690 | 575 | 1.1371 | 31205160 |
0.1672 | 0.5739 | 580 | 1.1371 | 31475368 |
0.1655 | 0.5788 | 585 | 1.1349 | 31754816 |
0.225 | 0.5838 | 590 | 1.1393 | 32025488 |
0.1848 | 0.5887 | 595 | 1.1365 | 32296504 |
0.1721 | 0.5937 | 600 | 1.1360 | 32568200 |
0.2217 | 0.5986 | 605 | 1.1389 | 32838328 |
0.1805 | 0.6036 | 610 | 1.1340 | 33109144 |
0.1842 | 0.6085 | 615 | 1.1356 | 33383840 |
0.2154 | 0.6135 | 620 | 1.1379 | 33653192 |
0.1544 | 0.6184 | 625 | 1.1345 | 33923880 |
0.15 | 0.6234 | 630 | 1.1345 | 34199032 |
0.2598 | 0.6283 | 635 | 1.1399 | 34474616 |
0.1512 | 0.6333 | 640 | 1.1339 | 34738176 |
0.1904 | 0.6382 | 645 | 1.1327 | 35007928 |
0.1674 | 0.6432 | 650 | 1.1337 | 35282072 |
0.2378 | 0.6481 | 655 | 1.1323 | 35560808 |
0.2768 | 0.6531 | 660 | 1.1310 | 35830608 |
0.1568 | 0.6580 | 665 | 1.1303 | 36099152 |
0.1588 | 0.6630 | 670 | 1.1319 | 36368888 |
0.1512 | 0.6679 | 675 | 1.1304 | 36643144 |
0.1405 | 0.6729 | 680 | 1.1287 | 36915576 |
0.1606 | 0.6778 | 685 | 1.1305 | 37188760 |
0.2743 | 0.6827 | 690 | 1.1299 | 37464904 |
0.2031 | 0.6877 | 695 | 1.1283 | 37735024 |
0.231 | 0.6926 | 700 | 1.1300 | 38009432 |
0.2176 | 0.6976 | 705 | 1.1279 | 38279672 |
0.168 | 0.7025 | 710 | 1.1283 | 38551560 |
0.2019 | 0.7075 | 715 | 1.1283 | 38819848 |
0.1824 | 0.7124 | 720 | 1.1266 | 39098320 |
0.1796 | 0.7174 | 725 | 1.1301 | 39369560 |
0.1729 | 0.7223 | 730 | 1.1279 | 39641720 |
0.1295 | 0.7273 | 735 | 1.1261 | 39910968 |
0.1952 | 0.7322 | 740 | 1.1287 | 40184432 |
0.199 | 0.7372 | 745 | 1.1257 | 40459144 |
0.2263 | 0.7421 | 750 | 1.1250 | 40731824 |
0.1827 | 0.7471 | 755 | 1.1241 | 41007352 |
0.2208 | 0.7520 | 760 | 1.1239 | 41285568 |
0.1647 | 0.7570 | 765 | 1.1269 | 41555600 |
0.1852 | 0.7619 | 770 | 1.1255 | 41828768 |
0.144 | 0.7669 | 775 | 1.1229 | 42093936 |
0.1777 | 0.7718 | 780 | 1.1250 | 42364320 |
0.1588 | 0.7767 | 785 | 1.1231 | 42641592 |
0.1641 | 0.7817 | 790 | 1.1227 | 42908024 |
0.2053 | 0.7866 | 795 | 1.1227 | 43174304 |
0.2087 | 0.7916 | 800 | 1.1205 | 43450320 |
0.1329 | 0.7965 | 805 | 1.1225 | 43725176 |
0.2402 | 0.8015 | 810 | 1.1220 | 43999000 |
0.199 | 0.8064 | 815 | 1.1183 | 44268504 |
0.1698 | 0.8114 | 820 | 1.1174 | 44536976 |
0.1965 | 0.8163 | 825 | 1.1181 | 44802256 |
0.2117 | 0.8213 | 830 | 1.1200 | 45077072 |
0.233 | 0.8262 | 835 | 1.1182 | 45342240 |
0.1588 | 0.8312 | 840 | 1.1198 | 45621432 |
0.1998 | 0.8361 | 845 | 1.1182 | 45892288 |
0.1661 | 0.8411 | 850 | 1.1197 | 46165816 |
0.1791 | 0.8460 | 855 | 1.1206 | 46442088 |
0.2373 | 0.8510 | 860 | 1.1169 | 46719776 |
0.1832 | 0.8559 | 865 | 1.1153 | 46988272 |
0.1202 | 0.8609 | 870 | 1.1187 | 47259640 |
0.1519 | 0.8658 | 875 | 1.1163 | 47525952 |
0.1704 | 0.8707 | 880 | 1.1149 | 47789840 |
0.2459 | 0.8757 | 885 | 1.1145 | 48067344 |
0.2517 | 0.8806 | 890 | 1.1131 | 48344352 |
0.1845 | 0.8856 | 895 | 1.1133 | 48615424 |
0.1957 | 0.8905 | 900 | 1.1164 | 48886208 |
0.1864 | 0.8955 | 905 | 1.1168 | 49153480 |
0.1807 | 0.9004 | 910 | 1.1162 | 49423304 |
0.1484 | 0.9054 | 915 | 1.1162 | 49696776 |
0.1922 | 0.9103 | 920 | 1.1164 | 49978144 |
0.2536 | 0.9153 | 925 | 1.1164 | 50246656 |
0.2772 | 0.9202 | 930 | 1.1146 | 50519528 |
0.1272 | 0.9252 | 935 | 1.1143 | 50784920 |
0.1583 | 0.9301 | 940 | 1.1163 | 51064104 |
0.2417 | 0.9351 | 945 | 1.1146 | 51336640 |
0.1931 | 0.9400 | 950 | 1.1127 | 51611928 |
0.1275 | 0.9450 | 955 | 1.1146 | 51881976 |
0.2402 | 0.9499 | 960 | 1.1160 | 52155824 |
0.1722 | 0.9549 | 965 | 1.1125 | 52423080 |
0.1641 | 0.9598 | 970 | 1.1132 | 52696224 |
0.156 | 0.9647 | 975 | 1.1160 | 52965272 |
0.1804 | 0.9697 | 980 | 1.1143 | 53236816 |
0.1858 | 0.9746 | 985 | 1.1138 | 53507704 |
0.1585 | 0.9796 | 990 | 1.1140 | 53783232 |
0.1601 | 0.9845 | 995 | 1.1132 | 54053576 |
0.1974 | 0.9895 | 1000 | 1.1144 | 54321712 |
0.2114 | 0.9944 | 1005 | 1.1117 | 54594056 |
0.2106 | 0.9994 | 1010 | 1.1117 | 54862728 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 2
Model tree for jkazdan/collapse_gemma-2-2b_hs2_accumulate_iter7_sftsd0
Base model
google/gemma-2-2b