--- license: gemma base_model: google/gemma-2-2b tags: - trl - sft - generated_from_trainer model-index: - name: collapse_gemma-2-2b_hs2_accumulate_iter4_sftsd2 results: [] --- # collapse_gemma-2-2b_hs2_accumulate_iter4_sftsd2 This model is a fine-tuned version of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) on an unknown dataset. It achieves the following results on the evaluation set: - Loss: 1.1004 - Num Input Tokens Seen: 20726616 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 8e-06 - train_batch_size: 8 - eval_batch_size: 16 - seed: 2 - gradient_accumulation_steps: 16 - total_train_batch_size: 128 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: constant_with_warmup - lr_scheduler_warmup_ratio: 0.05 - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen | |:-------------:|:------:|:----:|:---------------:|:-----------------:| | No log | 0 | 0 | 1.3909 | 0 | | 1.5618 | 0.0133 | 5 | 1.3747 | 274336 | | 1.4834 | 0.0266 | 10 | 1.2818 | 548560 | | 1.2778 | 0.0399 | 15 | 1.2113 | 826768 | | 1.2063 | 0.0532 | 20 | 1.1648 | 1100984 | | 1.0763 | 0.0666 | 25 | 1.1554 | 1381272 | | 1.0008 | 0.0799 | 30 | 1.1420 | 1655904 | | 1.0066 | 0.0932 | 35 | 1.1522 | 1934384 | | 1.0122 | 0.1065 | 40 | 1.1650 | 2209128 | | 0.8869 | 0.1198 | 45 | 1.1676 | 2482008 | | 0.8353 | 0.1331 | 50 | 1.1729 | 2757616 | | 0.7535 | 0.1464 | 55 | 1.1702 | 3028816 | | 0.677 | 0.1597 | 60 | 1.1699 | 3306688 | | 0.6353 | 0.1730 | 65 | 1.1718 | 3583176 | | 0.7474 | 0.1864 | 70 | 1.1582 | 3862120 | | 0.6487 | 0.1997 | 75 | 1.1621 | 4134624 | | 0.5399 | 0.2130 | 80 | 1.1678 | 4413112 | | 0.4752 | 0.2263 | 85 | 1.1588 | 4680680 | | 0.6822 | 0.2396 | 90 | 1.1598 | 4959520 | | 0.5627 | 0.2529 | 95 | 1.1590 | 5237032 | | 0.5604 | 0.2662 | 100 | 1.1571 | 5520816 | | 0.4439 | 0.2795 | 105 | 1.1547 | 5791784 | | 0.5118 | 0.2928 | 110 | 1.1562 | 6070648 | | 0.5673 | 0.3062 | 115 | 1.1532 | 6350816 | | 0.5077 | 0.3195 | 120 | 1.1491 | 6624856 | | 0.4819 | 0.3328 | 125 | 1.1451 | 6903024 | | 0.4622 | 0.3461 | 130 | 1.1461 | 7179008 | | 0.5332 | 0.3594 | 135 | 1.1403 | 7459288 | | 0.4536 | 0.3727 | 140 | 1.1447 | 7736168 | | 0.4125 | 0.3860 | 145 | 1.1386 | 8007400 | | 0.4507 | 0.3993 | 150 | 1.1381 | 8280296 | | 0.4411 | 0.4126 | 155 | 1.1353 | 8563096 | | 0.4867 | 0.4260 | 160 | 1.1342 | 8835744 | | 0.4239 | 0.4393 | 165 | 1.1335 | 9116184 | | 0.5198 | 0.4526 | 170 | 1.1308 | 9394976 | | 0.502 | 0.4659 | 175 | 1.1320 | 9676488 | | 0.5138 | 0.4792 | 180 | 1.1265 | 9952384 | | 0.4501 | 0.4925 | 185 | 1.1288 | 10223640 | | 0.4448 | 0.5058 | 190 | 1.1268 | 10503360 | | 0.4864 | 0.5191 | 195 | 1.1272 | 10783504 | | 0.5137 | 0.5324 | 200 | 1.1228 | 11061016 | | 0.4463 | 0.5458 | 205 | 1.1251 | 11334176 | | 0.5183 | 0.5591 | 210 | 1.1237 | 11611680 | | 0.4873 | 0.5724 | 215 | 1.1226 | 11889528 | | 0.4598 | 0.5857 | 220 | 1.1200 | 12165672 | | 0.4974 | 0.5990 | 225 | 1.1180 | 12447680 | | 0.307 | 0.6123 | 230 | 1.1191 | 12719352 | | 0.4302 | 0.6256 | 235 | 1.1154 | 12992608 | | 0.3704 | 0.6389 | 240 | 1.1187 | 13269640 | | 0.43 | 0.6522 | 245 | 1.1155 | 13545056 | | 0.3751 | 0.6656 | 250 | 1.1142 | 13821752 | | 0.349 | 0.6789 | 255 | 1.1122 | 14096592 | | 0.4908 | 0.6922 | 260 | 1.1105 | 14370976 | | 0.4156 | 0.7055 | 265 | 1.1105 | 14647576 | | 0.3021 | 0.7188 | 270 | 1.1102 | 14927104 | | 0.4337 | 0.7321 | 275 | 1.1104 | 15202424 | | 0.4187 | 0.7454 | 280 | 1.1080 | 15479160 | | 0.3928 | 0.7587 | 285 | 1.1124 | 15758584 | | 0.4093 | 0.7720 | 290 | 1.1058 | 16040872 | | 0.474 | 0.7854 | 295 | 1.1074 | 16312664 | | 0.4337 | 0.7987 | 300 | 1.1079 | 16592008 | | 0.2634 | 0.8120 | 305 | 1.1057 | 16866912 | | 0.3113 | 0.8253 | 310 | 1.1055 | 17146272 | | 0.4897 | 0.8386 | 315 | 1.1059 | 17425624 | | 0.4663 | 0.8519 | 320 | 1.1031 | 17698920 | | 0.4878 | 0.8652 | 325 | 1.1059 | 17972416 | | 0.3575 | 0.8785 | 330 | 1.1049 | 18246352 | | 0.406 | 0.8918 | 335 | 1.1022 | 18522448 | | 0.4651 | 0.9052 | 340 | 1.1042 | 18798208 | | 0.4508 | 0.9185 | 345 | 1.1032 | 19069304 | | 0.442 | 0.9318 | 350 | 1.1019 | 19352272 | | 0.3781 | 0.9451 | 355 | 1.1029 | 19630952 | | 0.4462 | 0.9584 | 360 | 1.0998 | 19903896 | | 0.3345 | 0.9717 | 365 | 1.1027 | 20176392 | | 0.4672 | 0.9850 | 370 | 1.1001 | 20451160 | | 0.3621 | 0.9983 | 375 | 1.1004 | 20726616 | ### Framework versions - Transformers 4.44.0 - Pytorch 2.4.0+cu121 - Datasets 2.20.0 - Tokenizers 0.19.1