collapse_gemma-2-2b_hs2_accumulate_iter18_sftsd1
This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.0987
- Num Input Tokens Seen: 92143336
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 1
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.3909 | 0 |
1.6987 | 0.0029 | 5 | 1.3904 | 270960 |
1.6486 | 0.0058 | 10 | 1.3832 | 543776 |
1.6449 | 0.0087 | 15 | 1.3606 | 806136 |
1.5808 | 0.0116 | 20 | 1.3291 | 1069472 |
1.5584 | 0.0144 | 25 | 1.2860 | 1329032 |
1.4797 | 0.0173 | 30 | 1.2449 | 1597912 |
1.3686 | 0.0202 | 35 | 1.2173 | 1866088 |
1.2231 | 0.0231 | 40 | 1.1920 | 2134728 |
1.0982 | 0.0260 | 45 | 1.1923 | 2409176 |
0.9703 | 0.0289 | 50 | 1.2209 | 2669112 |
0.9416 | 0.0318 | 55 | 1.2490 | 2934120 |
0.7829 | 0.0347 | 60 | 1.2681 | 3202328 |
0.7175 | 0.0375 | 65 | 1.2750 | 3468824 |
0.6113 | 0.0404 | 70 | 1.3062 | 3737368 |
0.4934 | 0.0433 | 75 | 1.3147 | 4000568 |
0.4708 | 0.0462 | 80 | 1.3241 | 4269560 |
0.3086 | 0.0491 | 85 | 1.2773 | 4527344 |
0.3285 | 0.0520 | 90 | 1.2721 | 4793952 |
0.3032 | 0.0549 | 95 | 1.2753 | 5062120 |
0.2329 | 0.0578 | 100 | 1.2260 | 5326728 |
0.3035 | 0.0606 | 105 | 1.2274 | 5600864 |
0.2459 | 0.0635 | 110 | 1.2213 | 5864096 |
0.2759 | 0.0664 | 115 | 1.2208 | 6122280 |
0.18 | 0.0693 | 120 | 1.2135 | 6389656 |
0.1301 | 0.0722 | 125 | 1.2084 | 6654192 |
0.2706 | 0.0751 | 130 | 1.1997 | 6926752 |
0.2556 | 0.0780 | 135 | 1.2031 | 7195712 |
0.2105 | 0.0809 | 140 | 1.2011 | 7464520 |
0.1426 | 0.0837 | 145 | 1.1993 | 7730008 |
0.1783 | 0.0866 | 150 | 1.2052 | 7998384 |
0.1648 | 0.0895 | 155 | 1.1888 | 8273232 |
0.1672 | 0.0924 | 160 | 1.1909 | 8546224 |
0.1124 | 0.0953 | 165 | 1.1949 | 8821008 |
0.1286 | 0.0982 | 170 | 1.1836 | 9091072 |
0.1845 | 0.1011 | 175 | 1.1917 | 9359184 |
0.2387 | 0.1040 | 180 | 1.1794 | 9619376 |
0.2107 | 0.1068 | 185 | 1.1842 | 9885592 |
0.1587 | 0.1097 | 190 | 1.1820 | 10152520 |
0.1815 | 0.1126 | 195 | 1.1793 | 10415520 |
0.1845 | 0.1155 | 200 | 1.1765 | 10682032 |
0.1766 | 0.1184 | 205 | 1.1769 | 10953560 |
0.1884 | 0.1213 | 210 | 1.1726 | 11223304 |
0.1853 | 0.1242 | 215 | 1.1755 | 11495576 |
0.184 | 0.1271 | 220 | 1.1777 | 11757464 |
0.1219 | 0.1299 | 225 | 1.1683 | 12015416 |
0.1437 | 0.1328 | 230 | 1.1761 | 12289472 |
0.1693 | 0.1357 | 235 | 1.1720 | 12558632 |
0.1442 | 0.1386 | 240 | 1.1686 | 12822488 |
0.2042 | 0.1415 | 245 | 1.1665 | 13079816 |
0.2118 | 0.1444 | 250 | 1.1705 | 13346176 |
0.1309 | 0.1473 | 255 | 1.1677 | 13611488 |
0.1784 | 0.1502 | 260 | 1.1683 | 13875176 |
0.1946 | 0.1530 | 265 | 1.1684 | 14141064 |
0.2299 | 0.1559 | 270 | 1.1637 | 14405528 |
0.1208 | 0.1588 | 275 | 1.1620 | 14674184 |
0.1641 | 0.1617 | 280 | 1.1586 | 14938784 |
0.1519 | 0.1646 | 285 | 1.1576 | 15202512 |
0.1524 | 0.1675 | 290 | 1.1634 | 15459472 |
0.1406 | 0.1704 | 295 | 1.1566 | 15724376 |
0.1566 | 0.1733 | 300 | 1.1578 | 15991128 |
0.1077 | 0.1761 | 305 | 1.1562 | 16258728 |
0.0978 | 0.1790 | 310 | 1.1572 | 16531472 |
0.1482 | 0.1819 | 315 | 1.1593 | 16795440 |
0.11 | 0.1848 | 320 | 1.1581 | 17054568 |
0.1224 | 0.1877 | 325 | 1.1565 | 17320832 |
0.1237 | 0.1906 | 330 | 1.1548 | 17589160 |
0.1225 | 0.1935 | 335 | 1.1549 | 17859592 |
0.1546 | 0.1964 | 340 | 1.1556 | 18122264 |
0.1408 | 0.1992 | 345 | 1.1552 | 18385856 |
0.121 | 0.2021 | 350 | 1.1556 | 18659936 |
0.2222 | 0.2050 | 355 | 1.1535 | 18918392 |
0.1528 | 0.2079 | 360 | 1.1541 | 19182728 |
0.1534 | 0.2108 | 365 | 1.1531 | 19450968 |
0.1442 | 0.2137 | 370 | 1.1496 | 19714280 |
0.1244 | 0.2166 | 375 | 1.1492 | 19977192 |
0.1912 | 0.2195 | 380 | 1.1534 | 20245192 |
0.1174 | 0.2224 | 385 | 1.1512 | 20509528 |
0.1046 | 0.2252 | 390 | 1.1502 | 20777056 |
0.1868 | 0.2281 | 395 | 1.1460 | 21041064 |
0.1649 | 0.2310 | 400 | 1.1449 | 21300688 |
0.1247 | 0.2339 | 405 | 1.1452 | 21571768 |
0.1122 | 0.2368 | 410 | 1.1434 | 21841096 |
0.2296 | 0.2397 | 415 | 1.1419 | 22107696 |
0.1551 | 0.2426 | 420 | 1.1422 | 22377296 |
0.1198 | 0.2455 | 425 | 1.1438 | 22647200 |
0.1214 | 0.2483 | 430 | 1.1441 | 22909288 |
0.1918 | 0.2512 | 435 | 1.1455 | 23176680 |
0.1422 | 0.2541 | 440 | 1.1450 | 23446384 |
0.1168 | 0.2570 | 445 | 1.1442 | 23711824 |
0.099 | 0.2599 | 450 | 1.1412 | 23974416 |
0.1084 | 0.2628 | 455 | 1.1436 | 24238504 |
0.1797 | 0.2657 | 460 | 1.1436 | 24504696 |
0.177 | 0.2686 | 465 | 1.1398 | 24765792 |
0.1445 | 0.2714 | 470 | 1.1427 | 25032552 |
0.1558 | 0.2743 | 475 | 1.1380 | 25295104 |
0.1002 | 0.2772 | 480 | 1.1363 | 25555720 |
0.1082 | 0.2801 | 485 | 1.1421 | 25817792 |
0.1059 | 0.2830 | 490 | 1.1393 | 26081592 |
0.1164 | 0.2859 | 495 | 1.1368 | 26347656 |
0.096 | 0.2888 | 500 | 1.1381 | 26617936 |
0.1255 | 0.2917 | 505 | 1.1374 | 26884744 |
0.1734 | 0.2945 | 510 | 1.1373 | 27149264 |
0.1357 | 0.2974 | 515 | 1.1365 | 27417000 |
0.1836 | 0.3003 | 520 | 1.1372 | 27682520 |
0.0934 | 0.3032 | 525 | 1.1400 | 27949544 |
0.0914 | 0.3061 | 530 | 1.1380 | 28216896 |
0.1157 | 0.3090 | 535 | 1.1353 | 28479296 |
0.146 | 0.3119 | 540 | 1.1351 | 28755216 |
0.1954 | 0.3148 | 545 | 1.1357 | 29019640 |
0.1166 | 0.3176 | 550 | 1.1334 | 29281320 |
0.1295 | 0.3205 | 555 | 1.1343 | 29546448 |
0.1361 | 0.3234 | 560 | 1.1355 | 29805376 |
0.1249 | 0.3263 | 565 | 1.1329 | 30074584 |
0.1307 | 0.3292 | 570 | 1.1340 | 30343192 |
0.1761 | 0.3321 | 575 | 1.1352 | 30600184 |
0.1241 | 0.3350 | 580 | 1.1304 | 30865784 |
0.1802 | 0.3379 | 585 | 1.1308 | 31131480 |
0.1077 | 0.3407 | 590 | 1.1331 | 31400416 |
0.2017 | 0.3436 | 595 | 1.1331 | 31672048 |
0.1348 | 0.3465 | 600 | 1.1299 | 31937752 |
0.1469 | 0.3494 | 605 | 1.1312 | 32203736 |
0.0765 | 0.3523 | 610 | 1.1303 | 32469208 |
0.1269 | 0.3552 | 615 | 1.1302 | 32734112 |
0.0929 | 0.3581 | 620 | 1.1305 | 33006400 |
0.2169 | 0.3610 | 625 | 1.1301 | 33270272 |
0.0898 | 0.3638 | 630 | 1.1272 | 33532832 |
0.1692 | 0.3667 | 635 | 1.1290 | 33797256 |
0.09 | 0.3696 | 640 | 1.1289 | 34063048 |
0.0708 | 0.3725 | 645 | 1.1301 | 34326024 |
0.1575 | 0.3754 | 650 | 1.1272 | 34597328 |
0.1042 | 0.3783 | 655 | 1.1269 | 34858432 |
0.1163 | 0.3812 | 660 | 1.1316 | 35121080 |
0.1444 | 0.3841 | 665 | 1.1375 | 35389184 |
0.1591 | 0.3869 | 670 | 1.1321 | 35649584 |
0.0957 | 0.3898 | 675 | 1.1245 | 35917320 |
0.2456 | 0.3927 | 680 | 1.1294 | 36187656 |
0.1111 | 0.3956 | 685 | 1.1298 | 36444688 |
0.103 | 0.3985 | 690 | 1.1280 | 36709440 |
0.0784 | 0.4014 | 695 | 1.1259 | 36972760 |
0.1514 | 0.4043 | 700 | 1.1284 | 37233720 |
0.1235 | 0.4072 | 705 | 1.1293 | 37497496 |
0.086 | 0.4100 | 710 | 1.1256 | 37762992 |
0.1205 | 0.4129 | 715 | 1.1255 | 38029864 |
0.0625 | 0.4158 | 720 | 1.1272 | 38296328 |
0.1199 | 0.4187 | 725 | 1.1257 | 38559600 |
0.1254 | 0.4216 | 730 | 1.1238 | 38833968 |
0.13 | 0.4245 | 735 | 1.1254 | 39100088 |
0.0957 | 0.4274 | 740 | 1.1274 | 39366176 |
0.1801 | 0.4303 | 745 | 1.1217 | 39636032 |
0.0944 | 0.4332 | 750 | 1.1207 | 39903584 |
0.1007 | 0.4360 | 755 | 1.1252 | 40163224 |
0.1033 | 0.4389 | 760 | 1.1256 | 40428144 |
0.1029 | 0.4418 | 765 | 1.1218 | 40697240 |
0.0746 | 0.4447 | 770 | 1.1230 | 40962568 |
0.1095 | 0.4476 | 775 | 1.1250 | 41228136 |
0.1302 | 0.4505 | 780 | 1.1244 | 41496296 |
0.1077 | 0.4534 | 785 | 1.1234 | 41760456 |
0.1226 | 0.4563 | 790 | 1.1204 | 42022912 |
0.1361 | 0.4591 | 795 | 1.1195 | 42291248 |
0.1083 | 0.4620 | 800 | 1.1202 | 42562552 |
0.1502 | 0.4649 | 805 | 1.1204 | 42833160 |
0.1147 | 0.4678 | 810 | 1.1204 | 43098648 |
0.1306 | 0.4707 | 815 | 1.1216 | 43360472 |
0.114 | 0.4736 | 820 | 1.1220 | 43628632 |
0.1 | 0.4765 | 825 | 1.1198 | 43889976 |
0.1245 | 0.4794 | 830 | 1.1207 | 44162040 |
0.1761 | 0.4822 | 835 | 1.1200 | 44429408 |
0.1565 | 0.4851 | 840 | 1.1190 | 44694248 |
0.1473 | 0.4880 | 845 | 1.1174 | 44967944 |
0.0811 | 0.4909 | 850 | 1.1188 | 45233848 |
0.0874 | 0.4938 | 855 | 1.1186 | 45501712 |
0.1277 | 0.4967 | 860 | 1.1189 | 45770224 |
0.1056 | 0.4996 | 865 | 1.1173 | 46026120 |
0.0927 | 0.5025 | 870 | 1.1164 | 46293248 |
0.1233 | 0.5053 | 875 | 1.1164 | 46559048 |
0.1055 | 0.5082 | 880 | 1.1181 | 46817752 |
0.132 | 0.5111 | 885 | 1.1189 | 47088288 |
0.108 | 0.5140 | 890 | 1.1168 | 47354792 |
0.1097 | 0.5169 | 895 | 1.1170 | 47621808 |
0.1805 | 0.5198 | 900 | 1.1151 | 47892168 |
0.1229 | 0.5227 | 905 | 1.1151 | 48164096 |
0.1484 | 0.5256 | 910 | 1.1181 | 48434336 |
0.1245 | 0.5284 | 915 | 1.1175 | 48701160 |
0.0801 | 0.5313 | 920 | 1.1155 | 48966920 |
0.0684 | 0.5342 | 925 | 1.1150 | 49231088 |
0.1012 | 0.5371 | 930 | 1.1172 | 49500872 |
0.0826 | 0.5400 | 935 | 1.1169 | 49764080 |
0.0547 | 0.5429 | 940 | 1.1156 | 50037824 |
0.1756 | 0.5458 | 945 | 1.1166 | 50313056 |
0.1313 | 0.5487 | 950 | 1.1165 | 50584128 |
0.1571 | 0.5515 | 955 | 1.1141 | 50847832 |
0.1404 | 0.5544 | 960 | 1.1148 | 51107992 |
0.1436 | 0.5573 | 965 | 1.1144 | 51370408 |
0.1767 | 0.5602 | 970 | 1.1130 | 51639728 |
0.15 | 0.5631 | 975 | 1.1121 | 51905288 |
0.1444 | 0.5660 | 980 | 1.1147 | 52176536 |
0.13 | 0.5689 | 985 | 1.1159 | 52449872 |
0.1294 | 0.5718 | 990 | 1.1149 | 52714680 |
0.1163 | 0.5746 | 995 | 1.1136 | 52980096 |
0.0975 | 0.5775 | 1000 | 1.1133 | 53242168 |
0.1348 | 0.5804 | 1005 | 1.1144 | 53515240 |
0.0872 | 0.5833 | 1010 | 1.1130 | 53776040 |
0.0634 | 0.5862 | 1015 | 1.1133 | 54042552 |
0.1704 | 0.5891 | 1020 | 1.1138 | 54309432 |
0.0965 | 0.5920 | 1025 | 1.1138 | 54576264 |
0.1 | 0.5949 | 1030 | 1.1143 | 54840264 |
0.1074 | 0.5977 | 1035 | 1.1140 | 55106192 |
0.101 | 0.6006 | 1040 | 1.1116 | 55370344 |
0.1473 | 0.6035 | 1045 | 1.1112 | 55637160 |
0.0814 | 0.6064 | 1050 | 1.1133 | 55906880 |
0.1764 | 0.6093 | 1055 | 1.1135 | 56178392 |
0.103 | 0.6122 | 1060 | 1.1120 | 56439152 |
0.1243 | 0.6151 | 1065 | 1.1120 | 56708416 |
0.1113 | 0.6180 | 1070 | 1.1122 | 56975080 |
0.1242 | 0.6208 | 1075 | 1.1118 | 57234312 |
0.0737 | 0.6237 | 1080 | 1.1114 | 57504112 |
0.1164 | 0.6266 | 1085 | 1.1145 | 57764192 |
0.1563 | 0.6295 | 1090 | 1.1125 | 58029744 |
0.144 | 0.6324 | 1095 | 1.1097 | 58296288 |
0.1292 | 0.6353 | 1100 | 1.1100 | 58559256 |
0.0958 | 0.6382 | 1105 | 1.1111 | 58816976 |
0.1067 | 0.6411 | 1110 | 1.1117 | 59088408 |
0.1166 | 0.6440 | 1115 | 1.1126 | 59349992 |
0.1541 | 0.6468 | 1120 | 1.1108 | 59608656 |
0.0775 | 0.6497 | 1125 | 1.1102 | 59877288 |
0.1546 | 0.6526 | 1130 | 1.1120 | 60144648 |
0.0741 | 0.6555 | 1135 | 1.1118 | 60414784 |
0.1158 | 0.6584 | 1140 | 1.1101 | 60687248 |
0.1345 | 0.6613 | 1145 | 1.1108 | 60956640 |
0.1763 | 0.6642 | 1150 | 1.1115 | 61222976 |
0.0611 | 0.6671 | 1155 | 1.1117 | 61489536 |
0.1453 | 0.6699 | 1160 | 1.1115 | 61762704 |
0.1826 | 0.6728 | 1165 | 1.1092 | 62028760 |
0.0834 | 0.6757 | 1170 | 1.1094 | 62298616 |
0.1709 | 0.6786 | 1175 | 1.1107 | 62568072 |
0.1787 | 0.6815 | 1180 | 1.1090 | 62832512 |
0.1068 | 0.6844 | 1185 | 1.1086 | 63094744 |
0.1228 | 0.6873 | 1190 | 1.1074 | 63363208 |
0.1137 | 0.6902 | 1195 | 1.1071 | 63643528 |
0.0934 | 0.6930 | 1200 | 1.1072 | 63911528 |
0.1905 | 0.6959 | 1205 | 1.1072 | 64172360 |
0.1285 | 0.6988 | 1210 | 1.1090 | 64439392 |
0.1405 | 0.7017 | 1215 | 1.1103 | 64711128 |
0.1031 | 0.7046 | 1220 | 1.1102 | 64974024 |
0.1651 | 0.7075 | 1225 | 1.1092 | 65234672 |
0.1112 | 0.7104 | 1230 | 1.1070 | 65493112 |
0.1175 | 0.7133 | 1235 | 1.1075 | 65758712 |
0.1216 | 0.7161 | 1240 | 1.1082 | 66024680 |
0.0749 | 0.7190 | 1245 | 1.1098 | 66295584 |
0.1513 | 0.7219 | 1250 | 1.1079 | 66559520 |
0.1151 | 0.7248 | 1255 | 1.1068 | 66834168 |
0.181 | 0.7277 | 1260 | 1.1075 | 67097544 |
0.1586 | 0.7306 | 1265 | 1.1087 | 67356608 |
0.0934 | 0.7335 | 1270 | 1.1081 | 67622152 |
0.0991 | 0.7364 | 1275 | 1.1070 | 67885504 |
0.1203 | 0.7392 | 1280 | 1.1060 | 68156088 |
0.1323 | 0.7421 | 1285 | 1.1049 | 68427920 |
0.1043 | 0.7450 | 1290 | 1.1056 | 68688672 |
0.1415 | 0.7479 | 1295 | 1.1070 | 68953512 |
0.1361 | 0.7508 | 1300 | 1.1058 | 69222744 |
0.1713 | 0.7537 | 1305 | 1.1041 | 69493152 |
0.1207 | 0.7566 | 1310 | 1.1047 | 69759064 |
0.123 | 0.7595 | 1315 | 1.1051 | 70028704 |
0.1134 | 0.7623 | 1320 | 1.1061 | 70292648 |
0.1002 | 0.7652 | 1325 | 1.1054 | 70559392 |
0.1196 | 0.7681 | 1330 | 1.1049 | 70828480 |
0.1276 | 0.7710 | 1335 | 1.1047 | 71101208 |
0.1287 | 0.7739 | 1340 | 1.1054 | 71367200 |
0.109 | 0.7768 | 1345 | 1.1039 | 71634640 |
0.1795 | 0.7797 | 1350 | 1.1032 | 71902800 |
0.1094 | 0.7826 | 1355 | 1.1032 | 72174800 |
0.125 | 0.7854 | 1360 | 1.1053 | 72432704 |
0.1531 | 0.7883 | 1365 | 1.1055 | 72700696 |
0.122 | 0.7912 | 1370 | 1.1034 | 72965800 |
0.0804 | 0.7941 | 1375 | 1.1032 | 73231184 |
0.146 | 0.7970 | 1380 | 1.1033 | 73498048 |
0.1349 | 0.7999 | 1385 | 1.1025 | 73761088 |
0.107 | 0.8028 | 1390 | 1.1037 | 74028624 |
0.0812 | 0.8057 | 1395 | 1.1038 | 74291768 |
0.1222 | 0.8085 | 1400 | 1.1045 | 74563544 |
0.1458 | 0.8114 | 1405 | 1.1054 | 74832776 |
0.1657 | 0.8143 | 1410 | 1.1023 | 75101176 |
0.1954 | 0.8172 | 1415 | 1.1017 | 75369232 |
0.0891 | 0.8201 | 1420 | 1.1022 | 75638216 |
0.0955 | 0.8230 | 1425 | 1.1041 | 75900240 |
0.1365 | 0.8259 | 1430 | 1.1035 | 76159624 |
0.1079 | 0.8288 | 1435 | 1.1004 | 76422720 |
0.0682 | 0.8316 | 1440 | 1.1013 | 76686592 |
0.0583 | 0.8345 | 1445 | 1.1029 | 76949056 |
0.1214 | 0.8374 | 1450 | 1.1024 | 77218632 |
0.1268 | 0.8403 | 1455 | 1.1006 | 77478560 |
0.1053 | 0.8432 | 1460 | 1.1008 | 77743768 |
0.108 | 0.8461 | 1465 | 1.1031 | 78017344 |
0.0866 | 0.8490 | 1470 | 1.1021 | 78276936 |
0.0885 | 0.8519 | 1475 | 1.1003 | 78544376 |
0.0623 | 0.8548 | 1480 | 1.1005 | 78808896 |
0.1158 | 0.8576 | 1485 | 1.1015 | 79078776 |
0.1327 | 0.8605 | 1490 | 1.1018 | 79345224 |
0.0456 | 0.8634 | 1495 | 1.1017 | 79606728 |
0.0962 | 0.8663 | 1500 | 1.1019 | 79872616 |
0.1048 | 0.8692 | 1505 | 1.1017 | 80139096 |
0.0817 | 0.8721 | 1510 | 1.1008 | 80403584 |
0.1074 | 0.8750 | 1515 | 1.1015 | 80670528 |
0.1072 | 0.8779 | 1520 | 1.1015 | 80938992 |
0.1117 | 0.8807 | 1525 | 1.1014 | 81204304 |
0.0757 | 0.8836 | 1530 | 1.1020 | 81466168 |
0.1819 | 0.8865 | 1535 | 1.1017 | 81736616 |
0.1645 | 0.8894 | 1540 | 1.0998 | 82000800 |
0.1252 | 0.8923 | 1545 | 1.0981 | 82269024 |
0.1398 | 0.8952 | 1550 | 1.0987 | 82540928 |
0.1036 | 0.8981 | 1555 | 1.1008 | 82807760 |
0.1573 | 0.9010 | 1560 | 1.1002 | 83066216 |
0.1581 | 0.9038 | 1565 | 1.0993 | 83339256 |
0.0878 | 0.9067 | 1570 | 1.0996 | 83604536 |
0.092 | 0.9096 | 1575 | 1.1001 | 83865496 |
0.1575 | 0.9125 | 1580 | 1.0991 | 84126424 |
0.08 | 0.9154 | 1585 | 1.0989 | 84391176 |
0.0513 | 0.9183 | 1590 | 1.1008 | 84660208 |
0.1259 | 0.9212 | 1595 | 1.1026 | 84930128 |
0.15 | 0.9241 | 1600 | 1.1019 | 85195328 |
0.0984 | 0.9269 | 1605 | 1.0990 | 85453816 |
0.1439 | 0.9298 | 1610 | 1.0993 | 85723592 |
0.1366 | 0.9327 | 1615 | 1.0992 | 85987408 |
0.1144 | 0.9356 | 1620 | 1.0992 | 86250040 |
0.1167 | 0.9385 | 1625 | 1.0995 | 86516408 |
0.1447 | 0.9414 | 1630 | 1.1004 | 86779568 |
0.1233 | 0.9443 | 1635 | 1.0990 | 87049088 |
0.1037 | 0.9472 | 1640 | 1.0979 | 87317264 |
0.1341 | 0.9500 | 1645 | 1.0985 | 87581184 |
0.1036 | 0.9529 | 1650 | 1.0992 | 87846968 |
0.1435 | 0.9558 | 1655 | 1.0976 | 88100656 |
0.1207 | 0.9587 | 1660 | 1.0968 | 88363744 |
0.1299 | 0.9616 | 1665 | 1.0978 | 88629744 |
0.1279 | 0.9645 | 1670 | 1.0990 | 88894456 |
0.1122 | 0.9674 | 1675 | 1.0988 | 89162032 |
0.1317 | 0.9703 | 1680 | 1.0972 | 89431088 |
0.1591 | 0.9731 | 1685 | 1.0972 | 89703328 |
0.1128 | 0.9760 | 1690 | 1.0987 | 89967664 |
0.1896 | 0.9789 | 1695 | 1.0985 | 90232816 |
0.0941 | 0.9818 | 1700 | 1.0973 | 90500560 |
0.1163 | 0.9847 | 1705 | 1.0960 | 90769384 |
0.0629 | 0.9876 | 1710 | 1.0973 | 91037864 |
0.1257 | 0.9905 | 1715 | 1.0987 | 91299848 |
0.0984 | 0.9934 | 1720 | 1.0984 | 91567824 |
0.086 | 0.9962 | 1725 | 1.0988 | 91834832 |
0.1386 | 0.9991 | 1730 | 1.0985 | 92091840 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 13
Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter18_sftsd1
Base model
google/gemma-2-2b