--- license: gemma base_model: google/gemma-2-2b tags: - trl - sft - generated_from_trainer model-index: - name: collapse_gemma-2-2b_hs2_accumulate_iter6_sftsd0 results: [] --- # collapse_gemma-2-2b_hs2_accumulate_iter6_sftsd0 This model is a fine-tuned version of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) on an unknown dataset. It achieves the following results on the evaluation set: - Loss: 1.1119 - Num Input Tokens Seen: 46755704 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 8e-06 - train_batch_size: 8 - eval_batch_size: 16 - seed: 0 - gradient_accumulation_steps: 16 - total_train_batch_size: 128 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: constant_with_warmup - lr_scheduler_warmup_ratio: 0.05 - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen | |:-------------:|:------:|:----:|:---------------:|:-----------------:| | No log | 0 | 0 | 1.3956 | 0 | | 1.7072 | 0.0059 | 5 | 1.3922 | 267024 | | 1.5511 | 0.0117 | 10 | 1.3635 | 548560 | | 1.5199 | 0.0176 | 15 | 1.2986 | 830488 | | 1.4321 | 0.0234 | 20 | 1.2487 | 1101904 | | 1.4119 | 0.0293 | 25 | 1.2016 | 1377928 | | 1.2848 | 0.0351 | 30 | 1.1721 | 1661264 | | 1.2144 | 0.0410 | 35 | 1.1683 | 1943544 | | 1.1417 | 0.0468 | 40 | 1.1494 | 2221200 | | 0.9487 | 0.0527 | 45 | 1.1817 | 2498336 | | 0.9135 | 0.0585 | 50 | 1.1897 | 2763296 | | 0.9445 | 0.0644 | 55 | 1.2065 | 3036752 | | 0.8185 | 0.0702 | 60 | 1.2143 | 3310768 | | 0.606 | 0.0761 | 65 | 1.2347 | 3581304 | | 0.7169 | 0.0819 | 70 | 1.2386 | 3866600 | | 0.6866 | 0.0878 | 75 | 1.2317 | 4142056 | | 0.6366 | 0.0936 | 80 | 1.2168 | 4416216 | | 0.5326 | 0.0995 | 85 | 1.2149 | 4686560 | | 0.4364 | 0.1053 | 90 | 1.2276 | 4959768 | | 0.4029 | 0.1112 | 95 | 1.2208 | 5230968 | | 0.3969 | 0.1170 | 100 | 1.2249 | 5504384 | | 0.4026 | 0.1229 | 105 | 1.2208 | 5775920 | | 0.4528 | 0.1287 | 110 | 1.2238 | 6049704 | | 0.4096 | 0.1346 | 115 | 1.2106 | 6327216 | | 0.3988 | 0.1404 | 120 | 1.2170 | 6601008 | | 0.4273 | 0.1463 | 125 | 1.2074 | 6877208 | | 0.3648 | 0.1521 | 130 | 1.2093 | 7155944 | | 0.282 | 0.1580 | 135 | 1.1978 | 7432256 | | 0.3538 | 0.1638 | 140 | 1.2051 | 7705992 | | 0.4239 | 0.1697 | 145 | 1.1951 | 7982024 | | 0.4044 | 0.1755 | 150 | 1.2000 | 8253584 | | 0.4297 | 0.1814 | 155 | 1.2035 | 8534352 | | 0.2586 | 0.1872 | 160 | 1.1974 | 8795744 | | 0.2682 | 0.1931 | 165 | 1.2044 | 9068272 | | 0.3477 | 0.1989 | 170 | 1.1952 | 9346008 | | 0.3633 | 0.2048 | 175 | 1.1954 | 9614616 | | 0.3786 | 0.2106 | 180 | 1.1975 | 9889768 | | 0.312 | 0.2165 | 185 | 1.1918 | 10167624 | | 0.3204 | 0.2223 | 190 | 1.1910 | 10437856 | | 0.3476 | 0.2282 | 195 | 1.1900 | 10712832 | | 0.2801 | 0.2340 | 200 | 1.1882 | 10977528 | | 0.2675 | 0.2399 | 205 | 1.1885 | 11245504 | | 0.2818 | 0.2457 | 210 | 1.1840 | 11514312 | | 0.2689 | 0.2516 | 215 | 1.1851 | 11793000 | | 0.3491 | 0.2574 | 220 | 1.1872 | 12068576 | | 0.3424 | 0.2633 | 225 | 1.1802 | 12342256 | | 0.2694 | 0.2691 | 230 | 1.1810 | 12614032 | | 0.4132 | 0.2750 | 235 | 1.1728 | 12882712 | | 0.2893 | 0.2808 | 240 | 1.1700 | 13149128 | | 0.2847 | 0.2867 | 245 | 1.1856 | 13421176 | | 0.3198 | 0.2925 | 250 | 1.1693 | 13696120 | | 0.2038 | 0.2984 | 255 | 1.1743 | 13965256 | | 0.222 | 0.3042 | 260 | 1.1832 | 14243792 | | 0.244 | 0.3101 | 265 | 1.1692 | 14524248 | | 0.3439 | 0.3159 | 270 | 1.1722 | 14805296 | | 0.2316 | 0.3218 | 275 | 1.1698 | 15078480 | | 0.2024 | 0.3276 | 280 | 1.1734 | 15353592 | | 0.2288 | 0.3335 | 285 | 1.1696 | 15628632 | | 0.2868 | 0.3393 | 290 | 1.1661 | 15902808 | | 0.3403 | 0.3452 | 295 | 1.1693 | 16179792 | | 0.3238 | 0.3510 | 300 | 1.1663 | 16456880 | | 0.236 | 0.3569 | 305 | 1.1625 | 16734104 | | 0.1991 | 0.3627 | 310 | 1.1644 | 17008776 | | 0.1729 | 0.3686 | 315 | 1.1646 | 17278024 | | 0.2047 | 0.3744 | 320 | 1.1630 | 17550640 | | 0.2911 | 0.3803 | 325 | 1.1582 | 17826960 | | 0.1639 | 0.3861 | 330 | 1.1703 | 18098336 | | 0.1956 | 0.3920 | 335 | 1.1660 | 18370416 | | 0.2335 | 0.3978 | 340 | 1.1550 | 18640240 | | 0.3123 | 0.4037 | 345 | 1.1614 | 18915192 | | 0.2137 | 0.4095 | 350 | 1.1581 | 19182344 | | 0.2683 | 0.4154 | 355 | 1.1541 | 19465576 | | 0.2263 | 0.4212 | 360 | 1.1560 | 19743312 | | 0.1861 | 0.4271 | 365 | 1.1590 | 20020896 | | 0.2883 | 0.4329 | 370 | 1.1546 | 20294232 | | 0.1755 | 0.4388 | 375 | 1.1525 | 20559040 | | 0.213 | 0.4446 | 380 | 1.1534 | 20822032 | | 0.1859 | 0.4505 | 385 | 1.1523 | 21099560 | | 0.2529 | 0.4563 | 390 | 1.1537 | 21368144 | | 0.242 | 0.4622 | 395 | 1.1498 | 21645832 | | 0.1993 | 0.4680 | 400 | 1.1491 | 21924544 | | 0.1637 | 0.4739 | 405 | 1.1509 | 22199720 | | 0.1812 | 0.4797 | 410 | 1.1441 | 22477384 | | 0.2141 | 0.4856 | 415 | 1.1454 | 22750888 | | 0.2874 | 0.4914 | 420 | 1.1489 | 23027632 | | 0.1906 | 0.4973 | 425 | 1.1413 | 23308144 | | 0.2803 | 0.5031 | 430 | 1.1433 | 23580088 | | 0.2174 | 0.5090 | 435 | 1.1437 | 23854088 | | 0.2305 | 0.5148 | 440 | 1.1424 | 24134544 | | 0.2014 | 0.5207 | 445 | 1.1465 | 24403536 | | 0.2768 | 0.5265 | 450 | 1.1414 | 24680664 | | 0.214 | 0.5324 | 455 | 1.1408 | 24952280 | | 0.3169 | 0.5382 | 460 | 1.1445 | 25231192 | | 0.2731 | 0.5441 | 465 | 1.1393 | 25505768 | | 0.2496 | 0.5499 | 470 | 1.1391 | 25785544 | | 0.2666 | 0.5558 | 475 | 1.1404 | 26056328 | | 0.1958 | 0.5616 | 480 | 1.1394 | 26331200 | | 0.1935 | 0.5675 | 485 | 1.1375 | 26610448 | | 0.1744 | 0.5734 | 490 | 1.1368 | 26883696 | | 0.2562 | 0.5792 | 495 | 1.1336 | 27155344 | | 0.218 | 0.5851 | 500 | 1.1342 | 27427808 | | 0.2348 | 0.5909 | 505 | 1.1335 | 27705544 | | 0.2619 | 0.5968 | 510 | 1.1323 | 27974816 | | 0.1454 | 0.6026 | 515 | 1.1351 | 28241360 | | 0.2899 | 0.6085 | 520 | 1.1348 | 28513256 | | 0.28 | 0.6143 | 525 | 1.1300 | 28781072 | | 0.2314 | 0.6202 | 530 | 1.1314 | 29051688 | | 0.1742 | 0.6260 | 535 | 1.1375 | 29322136 | | 0.2316 | 0.6319 | 540 | 1.1320 | 29591728 | | 0.197 | 0.6377 | 545 | 1.1289 | 29865856 | | 0.2103 | 0.6436 | 550 | 1.1322 | 30139496 | | 0.2218 | 0.6494 | 555 | 1.1290 | 30416656 | | 0.205 | 0.6553 | 560 | 1.1265 | 30696792 | | 0.1418 | 0.6611 | 565 | 1.1287 | 30971528 | | 0.2414 | 0.6670 | 570 | 1.1276 | 31244968 | | 0.2306 | 0.6728 | 575 | 1.1258 | 31520232 | | 0.2341 | 0.6787 | 580 | 1.1275 | 31795864 | | 0.2402 | 0.6845 | 585 | 1.1262 | 32069624 | | 0.2602 | 0.6904 | 590 | 1.1263 | 32337864 | | 0.2421 | 0.6962 | 595 | 1.1266 | 32618672 | | 0.1608 | 0.7021 | 600 | 1.1260 | 32898536 | | 0.266 | 0.7079 | 605 | 1.1234 | 33168224 | | 0.1589 | 0.7138 | 610 | 1.1262 | 33433136 | | 0.1982 | 0.7196 | 615 | 1.1257 | 33712384 | | 0.1458 | 0.7255 | 620 | 1.1258 | 33981912 | | 0.2513 | 0.7313 | 625 | 1.1299 | 34249392 | | 0.1416 | 0.7372 | 630 | 1.1239 | 34521488 | | 0.2103 | 0.7430 | 635 | 1.1246 | 34794184 | | 0.2409 | 0.7489 | 640 | 1.1256 | 35068416 | | 0.2248 | 0.7547 | 645 | 1.1218 | 35341160 | | 0.2517 | 0.7606 | 650 | 1.1225 | 35618656 | | 0.2098 | 0.7664 | 655 | 1.1215 | 35892176 | | 0.2069 | 0.7723 | 660 | 1.1203 | 36174472 | | 0.1857 | 0.7781 | 665 | 1.1229 | 36439872 | | 0.2552 | 0.7840 | 670 | 1.1202 | 36714872 | | 0.1902 | 0.7898 | 675 | 1.1188 | 36987872 | | 0.2204 | 0.7957 | 680 | 1.1201 | 37263224 | | 0.3015 | 0.8015 | 685 | 1.1189 | 37536992 | | 0.2118 | 0.8074 | 690 | 1.1192 | 37793976 | | 0.2303 | 0.8132 | 695 | 1.1178 | 38068432 | | 0.2148 | 0.8191 | 700 | 1.1194 | 38341616 | | 0.2132 | 0.8249 | 705 | 1.1185 | 38610776 | | 0.1463 | 0.8308 | 710 | 1.1194 | 38888584 | | 0.1878 | 0.8366 | 715 | 1.1210 | 39160392 | | 0.275 | 0.8425 | 720 | 1.1178 | 39426336 | | 0.1686 | 0.8483 | 725 | 1.1164 | 39698280 | | 0.1518 | 0.8542 | 730 | 1.1198 | 39967168 | | 0.2153 | 0.8600 | 735 | 1.1186 | 40242904 | | 0.22 | 0.8659 | 740 | 1.1163 | 40515024 | | 0.2084 | 0.8717 | 745 | 1.1172 | 40786080 | | 0.264 | 0.8776 | 750 | 1.1143 | 41059704 | | 0.1918 | 0.8834 | 755 | 1.1147 | 41331008 | | 0.2444 | 0.8893 | 760 | 1.1154 | 41603928 | | 0.1433 | 0.8951 | 765 | 1.1158 | 41873784 | | 0.2206 | 0.9010 | 770 | 1.1152 | 42140496 | | 0.204 | 0.9068 | 775 | 1.1131 | 42415368 | | 0.1427 | 0.9127 | 780 | 1.1143 | 42697792 | | 0.2541 | 0.9185 | 785 | 1.1149 | 42976216 | | 0.2033 | 0.9244 | 790 | 1.1160 | 43250816 | | 0.1249 | 0.9302 | 795 | 1.1139 | 43531552 | | 0.158 | 0.9361 | 800 | 1.1146 | 43811968 | | 0.1552 | 0.9419 | 805 | 1.1154 | 44080032 | | 0.1523 | 0.9478 | 810 | 1.1141 | 44351688 | | 0.1709 | 0.9536 | 815 | 1.1129 | 44631208 | | 0.133 | 0.9595 | 820 | 1.1129 | 44900816 | | 0.2698 | 0.9653 | 825 | 1.1133 | 45173960 | | 0.1856 | 0.9712 | 830 | 1.1131 | 45444920 | | 0.2218 | 0.9770 | 835 | 1.1141 | 45715920 | | 0.1803 | 0.9829 | 840 | 1.1164 | 45990080 | | 0.2412 | 0.9887 | 845 | 1.1138 | 46264088 | | 0.3314 | 0.9946 | 850 | 1.1103 | 46540904 | ### Framework versions - Transformers 4.44.0 - Pytorch 2.4.0+cu121 - Datasets 2.20.0 - Tokenizers 0.19.1