--- license: gemma base_model: google/gemma-2-2b tags: - trl - sft - generated_from_trainer model-index: - name: collapse_gemma-2-2b_hs2_accumulate_iter7_sftsd0 results: [] --- # collapse_gemma-2-2b_hs2_accumulate_iter7_sftsd0 This model is a fine-tuned version of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) on an unknown dataset. It achieves the following results on the evaluation set: - Loss: 1.0939 - Num Input Tokens Seen: 36687080 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 8e-06 - train_batch_size: 8 - eval_batch_size: 16 - seed: 0 - gradient_accumulation_steps: 16 - total_train_batch_size: 128 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: constant_with_warmup - lr_scheduler_warmup_ratio: 0.05 - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen | |:-------------:|:------:|:----:|:---------------:|:-----------------:| | No log | 0 | 0 | 1.3909 | 0 | | 1.5743 | 0.0076 | 5 | 1.3850 | 286024 | | 1.5698 | 0.0152 | 10 | 1.3359 | 565176 | | 1.5023 | 0.0227 | 15 | 1.2721 | 843224 | | 1.3784 | 0.0303 | 20 | 1.2210 | 1128808 | | 1.1853 | 0.0379 | 25 | 1.1834 | 1409632 | | 1.079 | 0.0455 | 30 | 1.1911 | 1688000 | | 0.9274 | 0.0531 | 35 | 1.2022 | 1961576 | | 0.8275 | 0.0607 | 40 | 1.2078 | 2242896 | | 0.6817 | 0.0682 | 45 | 1.2485 | 2524032 | | 0.5892 | 0.0758 | 50 | 1.2344 | 2801792 | | 0.4418 | 0.0834 | 55 | 1.2415 | 3078040 | | 0.4992 | 0.0910 | 60 | 1.1980 | 3358368 | | 0.4529 | 0.0986 | 65 | 1.2040 | 3643320 | | 0.4315 | 0.1062 | 70 | 1.2063 | 3920184 | | 0.3633 | 0.1137 | 75 | 1.1887 | 4195744 | | 0.3498 | 0.1213 | 80 | 1.1900 | 4474088 | | 0.5205 | 0.1289 | 85 | 1.1810 | 4750552 | | 0.4456 | 0.1365 | 90 | 1.1784 | 5033120 | | 0.2259 | 0.1441 | 95 | 1.1689 | 5308224 | | 0.2957 | 0.1517 | 100 | 1.1673 | 5584192 | | 0.2861 | 0.1592 | 105 | 1.1622 | 5855384 | | 0.396 | 0.1668 | 110 | 1.1576 | 6135472 | | 0.2727 | 0.1744 | 115 | 1.1593 | 6417808 | | 0.2863 | 0.1820 | 120 | 1.1536 | 6694768 | | 0.3506 | 0.1896 | 125 | 1.1512 | 6974920 | | 0.3593 | 0.1972 | 130 | 1.1506 | 7250952 | | 0.3129 | 0.2047 | 135 | 1.1464 | 7528424 | | 0.305 | 0.2123 | 140 | 1.1471 | 7796288 | | 0.2969 | 0.2199 | 145 | 1.1458 | 8071736 | | 0.3828 | 0.2275 | 150 | 1.1450 | 8354136 | | 0.2908 | 0.2351 | 155 | 1.1426 | 8627856 | | 0.3691 | 0.2427 | 160 | 1.1403 | 8906272 | | 0.248 | 0.2502 | 165 | 1.1434 | 9190272 | | 0.2853 | 0.2578 | 170 | 1.1398 | 9467688 | | 0.336 | 0.2654 | 175 | 1.1423 | 9745264 | | 0.2295 | 0.2730 | 180 | 1.1392 | 10022808 | | 0.2522 | 0.2806 | 185 | 1.1382 | 10307056 | | 0.2513 | 0.2882 | 190 | 1.1442 | 10582992 | | 0.2799 | 0.2957 | 195 | 1.1370 | 10866240 | | 0.2176 | 0.3033 | 200 | 1.1359 | 11148368 | | 0.293 | 0.3109 | 205 | 1.1353 | 11433232 | | 0.3076 | 0.3185 | 210 | 1.1317 | 11705656 | | 0.2469 | 0.3261 | 215 | 1.1337 | 11983632 | | 0.3734 | 0.3336 | 220 | 1.1323 | 12266112 | | 0.2704 | 0.3412 | 225 | 1.1290 | 12547976 | | 0.3469 | 0.3488 | 230 | 1.1300 | 12824592 | | 0.3266 | 0.3564 | 235 | 1.1280 | 13098760 | | 0.2528 | 0.3640 | 240 | 1.1268 | 13368616 | | 0.2867 | 0.3716 | 245 | 1.1266 | 13650008 | | 0.228 | 0.3791 | 250 | 1.1262 | 13927240 | | 0.233 | 0.3867 | 255 | 1.1249 | 14203184 | | 0.2724 | 0.3943 | 260 | 1.1250 | 14475384 | | 0.2117 | 0.4019 | 265 | 1.1245 | 14760384 | | 0.1981 | 0.4095 | 270 | 1.1226 | 15040960 | | 0.2519 | 0.4171 | 275 | 1.1219 | 15323064 | | 0.4068 | 0.4246 | 280 | 1.1205 | 15603904 | | 0.2811 | 0.4322 | 285 | 1.1214 | 15883608 | | 0.259 | 0.4398 | 290 | 1.1201 | 16159520 | | 0.2938 | 0.4474 | 295 | 1.1208 | 16437656 | | 0.2466 | 0.4550 | 300 | 1.1214 | 16716952 | | 0.2997 | 0.4626 | 305 | 1.1162 | 16992344 | | 0.2268 | 0.4701 | 310 | 1.1229 | 17268760 | | 0.343 | 0.4777 | 315 | 1.1172 | 17547648 | | 0.2424 | 0.4853 | 320 | 1.1154 | 17828288 | | 0.2849 | 0.4929 | 325 | 1.1172 | 18107576 | | 0.478 | 0.5005 | 330 | 1.1155 | 18387728 | | 0.1959 | 0.5081 | 335 | 1.1162 | 18667088 | | 0.1868 | 0.5156 | 340 | 1.1160 | 18950480 | | 0.234 | 0.5232 | 345 | 1.1150 | 19228760 | | 0.2519 | 0.5308 | 350 | 1.1135 | 19508952 | | 0.2625 | 0.5384 | 355 | 1.1145 | 19787448 | | 0.3843 | 0.5460 | 360 | 1.1109 | 20073168 | | 0.3005 | 0.5536 | 365 | 1.1109 | 20343008 | | 0.1833 | 0.5611 | 370 | 1.1110 | 20623352 | | 0.2446 | 0.5687 | 375 | 1.1093 | 20901240 | | 0.25 | 0.5763 | 380 | 1.1104 | 21185296 | | 0.2897 | 0.5839 | 385 | 1.1103 | 21464672 | | 0.168 | 0.5915 | 390 | 1.1099 | 21743520 | | 0.2387 | 0.5991 | 395 | 1.1106 | 22023544 | | 0.2066 | 0.6066 | 400 | 1.1072 | 22291944 | | 0.2191 | 0.6142 | 405 | 1.1089 | 22572096 | | 0.1869 | 0.6218 | 410 | 1.1085 | 22849472 | | 0.1939 | 0.6294 | 415 | 1.1075 | 23126440 | | 0.2368 | 0.6370 | 420 | 1.1091 | 23406096 | | 0.2209 | 0.6445 | 425 | 1.1066 | 23678072 | | 0.2523 | 0.6521 | 430 | 1.1077 | 23961192 | | 0.2416 | 0.6597 | 435 | 1.1082 | 24240520 | | 0.1964 | 0.6673 | 440 | 1.1057 | 24520856 | | 0.2369 | 0.6749 | 445 | 1.1055 | 24798288 | | 0.23 | 0.6825 | 450 | 1.1074 | 25075848 | | 0.2349 | 0.6900 | 455 | 1.1046 | 25344112 | | 0.243 | 0.6976 | 460 | 1.1063 | 25625216 | | 0.3343 | 0.7052 | 465 | 1.1066 | 25901904 | | 0.2341 | 0.7128 | 470 | 1.1042 | 26177128 | | 0.283 | 0.7204 | 475 | 1.1059 | 26459400 | | 0.3112 | 0.7280 | 480 | 1.1066 | 26736784 | | 0.3015 | 0.7355 | 485 | 1.1042 | 27017152 | | 0.2788 | 0.7431 | 490 | 1.1031 | 27295048 | | 0.1838 | 0.7507 | 495 | 1.1025 | 27575392 | | 0.2366 | 0.7583 | 500 | 1.1036 | 27852328 | | 0.297 | 0.7659 | 505 | 1.1032 | 28130032 | | 0.1622 | 0.7735 | 510 | 1.1015 | 28407672 | | 0.165 | 0.7810 | 515 | 1.1012 | 28680696 | | 0.3047 | 0.7886 | 520 | 1.1010 | 28957216 | | 0.336 | 0.7962 | 525 | 1.1012 | 29235048 | | 0.2728 | 0.8038 | 530 | 1.1011 | 29507352 | | 0.2007 | 0.8114 | 535 | 1.1008 | 29778208 | | 0.2253 | 0.8190 | 540 | 1.1013 | 30055416 | | 0.2386 | 0.8265 | 545 | 1.0982 | 30333728 | | 0.2056 | 0.8341 | 550 | 1.0989 | 30599088 | | 0.2879 | 0.8417 | 555 | 1.1003 | 30883072 | | 0.2207 | 0.8493 | 560 | 1.0993 | 31160232 | | 0.2821 | 0.8569 | 565 | 1.0979 | 31441272 | | 0.2246 | 0.8645 | 570 | 1.0982 | 31712696 | | 0.3249 | 0.8720 | 575 | 1.0980 | 31991400 | | 0.2616 | 0.8796 | 580 | 1.0985 | 32269224 | | 0.2716 | 0.8872 | 585 | 1.0997 | 32542384 | | 0.2898 | 0.8948 | 590 | 1.0979 | 32826016 | | 0.2617 | 0.9024 | 595 | 1.0968 | 33110848 | | 0.2057 | 0.9100 | 600 | 1.0988 | 33391352 | | 0.293 | 0.9175 | 605 | 1.0965 | 33670472 | | 0.2081 | 0.9251 | 610 | 1.0947 | 33950936 | | 0.2801 | 0.9327 | 615 | 1.0963 | 34226952 | | 0.2678 | 0.9403 | 620 | 1.0952 | 34502376 | | 0.222 | 0.9479 | 625 | 1.0944 | 34774480 | | 0.2561 | 0.9555 | 630 | 1.0944 | 35057720 | | 0.2738 | 0.9630 | 635 | 1.0947 | 35333096 | | 0.182 | 0.9706 | 640 | 1.0947 | 35614552 | | 0.224 | 0.9782 | 645 | 1.0935 | 35890992 | | 0.2861 | 0.9858 | 650 | 1.0935 | 36177736 | | 0.2674 | 0.9934 | 655 | 1.0948 | 36462944 | ### Framework versions - Transformers 4.44.0 - Pytorch 2.4.0+cu121 - Datasets 2.20.0 - Tokenizers 0.19.1