collapse_gemma-2-2b_hs2_accumulate_iter9_sftsd2
This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.1054
- Num Input Tokens Seen: 71761808
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 2
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.3956 | 0 |
1.7401 | 0.0037 | 5 | 1.3949 | 265424 |
1.616 | 0.0075 | 10 | 1.3816 | 539224 |
1.6234 | 0.0112 | 15 | 1.3499 | 803360 |
1.5512 | 0.0150 | 20 | 1.2983 | 1070120 |
1.3853 | 0.0187 | 25 | 1.2531 | 1344176 |
1.389 | 0.0225 | 30 | 1.2195 | 1616536 |
1.3792 | 0.0262 | 35 | 1.1859 | 1890008 |
1.1895 | 0.0300 | 40 | 1.1837 | 2162864 |
1.1764 | 0.0337 | 45 | 1.1876 | 2444240 |
1.012 | 0.0375 | 50 | 1.2105 | 2717792 |
0.8381 | 0.0412 | 55 | 1.2296 | 2985376 |
0.739 | 0.0450 | 60 | 1.2775 | 3254792 |
0.6771 | 0.0487 | 65 | 1.2940 | 3527544 |
0.6312 | 0.0525 | 70 | 1.3021 | 3795200 |
0.4283 | 0.0562 | 75 | 1.3008 | 4070368 |
0.4371 | 0.0600 | 80 | 1.2797 | 4340040 |
0.4696 | 0.0637 | 85 | 1.2630 | 4606064 |
0.3194 | 0.0675 | 90 | 1.2419 | 4871584 |
0.4238 | 0.0712 | 95 | 1.2309 | 5147160 |
0.3507 | 0.0749 | 100 | 1.2619 | 5412296 |
0.2427 | 0.0787 | 105 | 1.2277 | 5687448 |
0.3179 | 0.0824 | 110 | 1.2445 | 5951552 |
0.2366 | 0.0862 | 115 | 1.2256 | 6215912 |
0.2594 | 0.0899 | 120 | 1.2126 | 6488088 |
0.2541 | 0.0937 | 125 | 1.2206 | 6757624 |
0.2565 | 0.0974 | 130 | 1.2042 | 7026376 |
0.2052 | 0.1012 | 135 | 1.2061 | 7298768 |
0.2036 | 0.1049 | 140 | 1.2040 | 7568328 |
0.2464 | 0.1087 | 145 | 1.2011 | 7831752 |
0.2566 | 0.1124 | 150 | 1.2057 | 8097784 |
0.2825 | 0.1162 | 155 | 1.1961 | 8362928 |
0.1344 | 0.1199 | 160 | 1.1928 | 8636976 |
0.2387 | 0.1237 | 165 | 1.2002 | 8905344 |
0.2114 | 0.1274 | 170 | 1.1923 | 9177200 |
0.2481 | 0.1312 | 175 | 1.1907 | 9442984 |
0.1336 | 0.1349 | 180 | 1.1866 | 9713520 |
0.2392 | 0.1386 | 185 | 1.1908 | 9974536 |
0.2917 | 0.1424 | 190 | 1.1850 | 10244464 |
0.1185 | 0.1461 | 195 | 1.1827 | 10511152 |
0.2313 | 0.1499 | 200 | 1.1918 | 10771216 |
0.1684 | 0.1536 | 205 | 1.1852 | 11036864 |
0.1677 | 0.1574 | 210 | 1.1822 | 11302432 |
0.25 | 0.1611 | 215 | 1.1873 | 11571464 |
0.2106 | 0.1649 | 220 | 1.1819 | 11835776 |
0.2641 | 0.1686 | 225 | 1.1805 | 12106576 |
0.1897 | 0.1724 | 230 | 1.1840 | 12376584 |
0.1947 | 0.1761 | 235 | 1.1791 | 12644224 |
0.231 | 0.1799 | 240 | 1.1800 | 12919288 |
0.1835 | 0.1836 | 245 | 1.1786 | 13180264 |
0.1879 | 0.1874 | 250 | 1.1730 | 13451568 |
0.1873 | 0.1911 | 255 | 1.1775 | 13721800 |
0.1545 | 0.1949 | 260 | 1.1741 | 13992424 |
0.2525 | 0.1986 | 265 | 1.1696 | 14259608 |
0.1461 | 0.2024 | 270 | 1.1731 | 14521656 |
0.1881 | 0.2061 | 275 | 1.1672 | 14794472 |
0.1946 | 0.2098 | 280 | 1.1751 | 15063640 |
0.1882 | 0.2136 | 285 | 1.1709 | 15333992 |
0.237 | 0.2173 | 290 | 1.1693 | 15610184 |
0.1811 | 0.2211 | 295 | 1.1689 | 15873744 |
0.1678 | 0.2248 | 300 | 1.1730 | 16142552 |
0.1883 | 0.2286 | 305 | 1.1686 | 16414088 |
0.1956 | 0.2323 | 310 | 1.1640 | 16681512 |
0.1685 | 0.2361 | 315 | 1.1728 | 16951760 |
0.1759 | 0.2398 | 320 | 1.1614 | 17213336 |
0.1901 | 0.2436 | 325 | 1.1636 | 17480968 |
0.1511 | 0.2473 | 330 | 1.1634 | 17756504 |
0.2034 | 0.2511 | 335 | 1.1629 | 18020040 |
0.2454 | 0.2548 | 340 | 1.1594 | 18286944 |
0.1758 | 0.2586 | 345 | 1.1650 | 18560152 |
0.1308 | 0.2623 | 350 | 1.1582 | 18826832 |
0.145 | 0.2661 | 355 | 1.1586 | 19103144 |
0.1719 | 0.2698 | 360 | 1.1614 | 19377096 |
0.1671 | 0.2735 | 365 | 1.1587 | 19653032 |
0.1718 | 0.2773 | 370 | 1.1563 | 19924256 |
0.2379 | 0.2810 | 375 | 1.1547 | 20202808 |
0.1866 | 0.2848 | 380 | 1.1536 | 20475992 |
0.1852 | 0.2885 | 385 | 1.1570 | 20744808 |
0.2241 | 0.2923 | 390 | 1.1532 | 21007656 |
0.1906 | 0.2960 | 395 | 1.1500 | 21288888 |
0.1631 | 0.2998 | 400 | 1.1501 | 21548352 |
0.1641 | 0.3035 | 405 | 1.1483 | 21811440 |
0.1462 | 0.3073 | 410 | 1.1515 | 22079576 |
0.171 | 0.3110 | 415 | 1.1500 | 22350648 |
0.1351 | 0.3148 | 420 | 1.1466 | 22629320 |
0.1691 | 0.3185 | 425 | 1.1483 | 22898512 |
0.1566 | 0.3223 | 430 | 1.1465 | 23159280 |
0.2116 | 0.3260 | 435 | 1.1447 | 23428672 |
0.1942 | 0.3298 | 440 | 1.1497 | 23695520 |
0.1723 | 0.3335 | 445 | 1.1478 | 23967384 |
0.1023 | 0.3373 | 450 | 1.1481 | 24232440 |
0.1779 | 0.3410 | 455 | 1.1472 | 24499224 |
0.1213 | 0.3447 | 460 | 1.1399 | 24763840 |
0.1693 | 0.3485 | 465 | 1.1438 | 25039768 |
0.1308 | 0.3522 | 470 | 1.1475 | 25308848 |
0.2497 | 0.3560 | 475 | 1.1417 | 25581448 |
0.1648 | 0.3597 | 480 | 1.1398 | 25850576 |
0.2038 | 0.3635 | 485 | 1.1441 | 26118912 |
0.1375 | 0.3672 | 490 | 1.1402 | 26385976 |
0.1453 | 0.3710 | 495 | 1.1402 | 26659912 |
0.1777 | 0.3747 | 500 | 1.1406 | 26932456 |
0.129 | 0.3785 | 505 | 1.1407 | 27206072 |
0.2137 | 0.3822 | 510 | 1.1433 | 27476240 |
0.1269 | 0.3860 | 515 | 1.1405 | 27742424 |
0.1543 | 0.3897 | 520 | 1.1399 | 28015344 |
0.1309 | 0.3935 | 525 | 1.1391 | 28288264 |
0.1399 | 0.3972 | 530 | 1.1400 | 28554464 |
0.0881 | 0.4010 | 535 | 1.1410 | 28825328 |
0.1215 | 0.4047 | 540 | 1.1386 | 29099088 |
0.185 | 0.4085 | 545 | 1.1365 | 29368600 |
0.1721 | 0.4122 | 550 | 1.1362 | 29636064 |
0.1385 | 0.4159 | 555 | 1.1373 | 29900800 |
0.1195 | 0.4197 | 560 | 1.1364 | 30171432 |
0.1734 | 0.4234 | 565 | 1.1357 | 30438336 |
0.1677 | 0.4272 | 570 | 1.1338 | 30714016 |
0.1777 | 0.4309 | 575 | 1.1351 | 30981568 |
0.1973 | 0.4347 | 580 | 1.1343 | 31251464 |
0.1048 | 0.4384 | 585 | 1.1299 | 31524320 |
0.1891 | 0.4422 | 590 | 1.1314 | 31792104 |
0.1486 | 0.4459 | 595 | 1.1342 | 32070192 |
0.122 | 0.4497 | 600 | 1.1333 | 32343416 |
0.1051 | 0.4534 | 605 | 1.1317 | 32611776 |
0.1072 | 0.4572 | 610 | 1.1305 | 32874304 |
0.1829 | 0.4609 | 615 | 1.1302 | 33134032 |
0.1181 | 0.4647 | 620 | 1.1295 | 33395984 |
0.1172 | 0.4684 | 625 | 1.1343 | 33663904 |
0.1387 | 0.4722 | 630 | 1.1284 | 33937656 |
0.1545 | 0.4759 | 635 | 1.1261 | 34214032 |
0.1341 | 0.4796 | 640 | 1.1327 | 34480960 |
0.2333 | 0.4834 | 645 | 1.1325 | 34750776 |
0.1263 | 0.4871 | 650 | 1.1268 | 35017280 |
0.1657 | 0.4909 | 655 | 1.1295 | 35292376 |
0.1027 | 0.4946 | 660 | 1.1298 | 35557680 |
0.1902 | 0.4984 | 665 | 1.1263 | 35825768 |
0.1084 | 0.5021 | 670 | 1.1269 | 36099992 |
0.191 | 0.5059 | 675 | 1.1288 | 36368104 |
0.1571 | 0.5096 | 680 | 1.1252 | 36633744 |
0.2993 | 0.5134 | 685 | 1.1253 | 36903584 |
0.1279 | 0.5171 | 690 | 1.1277 | 37168288 |
0.1675 | 0.5209 | 695 | 1.1295 | 37432752 |
0.1387 | 0.5246 | 700 | 1.1255 | 37700872 |
0.1349 | 0.5284 | 705 | 1.1277 | 37969576 |
0.1273 | 0.5321 | 710 | 1.1309 | 38243016 |
0.1643 | 0.5359 | 715 | 1.1274 | 38517288 |
0.1605 | 0.5396 | 720 | 1.1250 | 38782776 |
0.1359 | 0.5434 | 725 | 1.1255 | 39059112 |
0.134 | 0.5471 | 730 | 1.1292 | 39333136 |
0.1868 | 0.5508 | 735 | 1.1266 | 39603304 |
0.1035 | 0.5546 | 740 | 1.1231 | 39878104 |
0.0887 | 0.5583 | 745 | 1.1251 | 40155384 |
0.1917 | 0.5621 | 750 | 1.1250 | 40422408 |
0.1682 | 0.5658 | 755 | 1.1221 | 40693840 |
0.203 | 0.5696 | 760 | 1.1221 | 40964832 |
0.0827 | 0.5733 | 765 | 1.1218 | 41236656 |
0.1417 | 0.5771 | 770 | 1.1233 | 41504136 |
0.1631 | 0.5808 | 775 | 1.1222 | 41769800 |
0.1691 | 0.5846 | 780 | 1.1213 | 42034128 |
0.174 | 0.5883 | 785 | 1.1192 | 42305168 |
0.2106 | 0.5921 | 790 | 1.1218 | 42574496 |
0.1259 | 0.5958 | 795 | 1.1240 | 42845032 |
0.1654 | 0.5996 | 800 | 1.1237 | 43112152 |
0.1042 | 0.6033 | 805 | 1.1219 | 43373840 |
0.1091 | 0.6071 | 810 | 1.1214 | 43640656 |
0.198 | 0.6108 | 815 | 1.1221 | 43912768 |
0.1154 | 0.6145 | 820 | 1.1201 | 44179992 |
0.1564 | 0.6183 | 825 | 1.1197 | 44444896 |
0.1316 | 0.6220 | 830 | 1.1210 | 44710784 |
0.1439 | 0.6258 | 835 | 1.1201 | 44988600 |
0.0951 | 0.6295 | 840 | 1.1188 | 45261808 |
0.1782 | 0.6333 | 845 | 1.1189 | 45533720 |
0.1605 | 0.6370 | 850 | 1.1197 | 45806680 |
0.1381 | 0.6408 | 855 | 1.1201 | 46077920 |
0.1733 | 0.6445 | 860 | 1.1195 | 46340032 |
0.0857 | 0.6483 | 865 | 1.1209 | 46604672 |
0.1277 | 0.6520 | 870 | 1.1215 | 46876896 |
0.1226 | 0.6558 | 875 | 1.1198 | 47146632 |
0.0836 | 0.6595 | 880 | 1.1207 | 47422784 |
0.1251 | 0.6633 | 885 | 1.1204 | 47689336 |
0.125 | 0.6670 | 890 | 1.1182 | 47958016 |
0.1048 | 0.6708 | 895 | 1.1204 | 48223864 |
0.1882 | 0.6745 | 900 | 1.1207 | 48488808 |
0.2256 | 0.6783 | 905 | 1.1157 | 48759288 |
0.1635 | 0.6820 | 910 | 1.1143 | 49031648 |
0.205 | 0.6857 | 915 | 1.1185 | 49294048 |
0.1455 | 0.6895 | 920 | 1.1197 | 49568528 |
0.111 | 0.6932 | 925 | 1.1179 | 49838992 |
0.1617 | 0.6970 | 930 | 1.1203 | 50102856 |
0.1987 | 0.7007 | 935 | 1.1191 | 50369176 |
0.1317 | 0.7045 | 940 | 1.1147 | 50638040 |
0.1474 | 0.7082 | 945 | 1.1166 | 50905480 |
0.1999 | 0.7120 | 950 | 1.1157 | 51179032 |
0.1198 | 0.7157 | 955 | 1.1148 | 51443536 |
0.1666 | 0.7195 | 960 | 1.1168 | 51708464 |
0.1456 | 0.7232 | 965 | 1.1177 | 51979744 |
0.1664 | 0.7270 | 970 | 1.1158 | 52254064 |
0.1317 | 0.7307 | 975 | 1.1164 | 52521544 |
0.2294 | 0.7345 | 980 | 1.1197 | 52784920 |
0.1252 | 0.7382 | 985 | 1.1180 | 53053784 |
0.1464 | 0.7420 | 990 | 1.1154 | 53322616 |
0.1474 | 0.7457 | 995 | 1.1167 | 53595016 |
0.1162 | 0.7494 | 1000 | 1.1175 | 53860928 |
0.1531 | 0.7532 | 1005 | 1.1148 | 54138576 |
0.186 | 0.7569 | 1010 | 1.1152 | 54411960 |
0.1991 | 0.7607 | 1015 | 1.1148 | 54679632 |
0.1347 | 0.7644 | 1020 | 1.1128 | 54940112 |
0.1664 | 0.7682 | 1025 | 1.1140 | 55209760 |
0.1538 | 0.7719 | 1030 | 1.1136 | 55478480 |
0.1654 | 0.7757 | 1035 | 1.1140 | 55743160 |
0.1235 | 0.7794 | 1040 | 1.1132 | 56020920 |
0.1584 | 0.7832 | 1045 | 1.1120 | 56293368 |
0.1833 | 0.7869 | 1050 | 1.1114 | 56556400 |
0.1827 | 0.7907 | 1055 | 1.1120 | 56817288 |
0.1177 | 0.7944 | 1060 | 1.1125 | 57082560 |
0.1671 | 0.7982 | 1065 | 1.1126 | 57353928 |
0.1281 | 0.8019 | 1070 | 1.1137 | 57615912 |
0.0644 | 0.8057 | 1075 | 1.1129 | 57892704 |
0.155 | 0.8094 | 1080 | 1.1116 | 58160816 |
0.1137 | 0.8132 | 1085 | 1.1136 | 58427224 |
0.1173 | 0.8169 | 1090 | 1.1159 | 58697032 |
0.1177 | 0.8206 | 1095 | 1.1140 | 58958536 |
0.1739 | 0.8244 | 1100 | 1.1124 | 59223816 |
0.1382 | 0.8281 | 1105 | 1.1092 | 59485128 |
0.1059 | 0.8319 | 1110 | 1.1099 | 59749504 |
0.0573 | 0.8356 | 1115 | 1.1125 | 60018192 |
0.1833 | 0.8394 | 1120 | 1.1139 | 60286688 |
0.1041 | 0.8431 | 1125 | 1.1116 | 60560560 |
0.1508 | 0.8469 | 1130 | 1.1120 | 60830912 |
0.0838 | 0.8506 | 1135 | 1.1172 | 61103064 |
0.0997 | 0.8544 | 1140 | 1.1154 | 61368104 |
0.1566 | 0.8581 | 1145 | 1.1107 | 61628792 |
0.1589 | 0.8619 | 1150 | 1.1091 | 61897384 |
0.1563 | 0.8656 | 1155 | 1.1100 | 62167304 |
0.1451 | 0.8694 | 1160 | 1.1101 | 62435640 |
0.167 | 0.8731 | 1165 | 1.1092 | 62701456 |
0.1815 | 0.8769 | 1170 | 1.1098 | 62967384 |
0.1653 | 0.8806 | 1175 | 1.1097 | 63228816 |
0.1097 | 0.8844 | 1180 | 1.1104 | 63495112 |
0.1538 | 0.8881 | 1185 | 1.1098 | 63772160 |
0.1251 | 0.8918 | 1190 | 1.1089 | 64041840 |
0.1401 | 0.8956 | 1195 | 1.1099 | 64306880 |
0.1343 | 0.8993 | 1200 | 1.1103 | 64577568 |
0.1262 | 0.9031 | 1205 | 1.1095 | 64851096 |
0.131 | 0.9068 | 1210 | 1.1093 | 65116080 |
0.1453 | 0.9106 | 1215 | 1.1110 | 65381352 |
0.0764 | 0.9143 | 1220 | 1.1113 | 65653032 |
0.1491 | 0.9181 | 1225 | 1.1087 | 65918040 |
0.1883 | 0.9218 | 1230 | 1.1076 | 66192000 |
0.1539 | 0.9256 | 1235 | 1.1078 | 66463448 |
0.1338 | 0.9293 | 1240 | 1.1095 | 66728856 |
0.1632 | 0.9331 | 1245 | 1.1068 | 66994328 |
0.0888 | 0.9368 | 1250 | 1.1054 | 67260040 |
0.1309 | 0.9406 | 1255 | 1.1065 | 67530168 |
0.082 | 0.9443 | 1260 | 1.1097 | 67802552 |
0.1326 | 0.9481 | 1265 | 1.1089 | 68065552 |
0.1835 | 0.9518 | 1270 | 1.1055 | 68333520 |
0.1417 | 0.9555 | 1275 | 1.1064 | 68604176 |
0.1448 | 0.9593 | 1280 | 1.1114 | 68869712 |
0.1336 | 0.9630 | 1285 | 1.1085 | 69147456 |
0.1582 | 0.9668 | 1290 | 1.1065 | 69416744 |
0.0791 | 0.9705 | 1295 | 1.1057 | 69683232 |
0.1824 | 0.9743 | 1300 | 1.1073 | 69945640 |
0.1288 | 0.9780 | 1305 | 1.1101 | 70213592 |
0.1758 | 0.9818 | 1310 | 1.1079 | 70485672 |
0.1566 | 0.9855 | 1315 | 1.1063 | 70752536 |
0.1071 | 0.9893 | 1320 | 1.1077 | 71012464 |
0.1201 | 0.9930 | 1325 | 1.1091 | 71281048 |
0.1388 | 0.9968 | 1330 | 1.1071 | 71547504 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 4
Model tree for jkazdan/collapse_gemma-2-2b_hs2_accumulate_iter9_sftsd2
Base model
google/gemma-2-2b