collapse_gemma-2-2b_hs2_accumulate_iter4_sftsd2
This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.1004
- Num Input Tokens Seen: 20726616
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 2
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.3909 | 0 |
1.5618 | 0.0133 | 5 | 1.3747 | 274336 |
1.4834 | 0.0266 | 10 | 1.2818 | 548560 |
1.2778 | 0.0399 | 15 | 1.2113 | 826768 |
1.2063 | 0.0532 | 20 | 1.1648 | 1100984 |
1.0763 | 0.0666 | 25 | 1.1554 | 1381272 |
1.0008 | 0.0799 | 30 | 1.1420 | 1655904 |
1.0066 | 0.0932 | 35 | 1.1522 | 1934384 |
1.0122 | 0.1065 | 40 | 1.1650 | 2209128 |
0.8869 | 0.1198 | 45 | 1.1676 | 2482008 |
0.8353 | 0.1331 | 50 | 1.1729 | 2757616 |
0.7535 | 0.1464 | 55 | 1.1702 | 3028816 |
0.677 | 0.1597 | 60 | 1.1699 | 3306688 |
0.6353 | 0.1730 | 65 | 1.1718 | 3583176 |
0.7474 | 0.1864 | 70 | 1.1582 | 3862120 |
0.6487 | 0.1997 | 75 | 1.1621 | 4134624 |
0.5399 | 0.2130 | 80 | 1.1678 | 4413112 |
0.4752 | 0.2263 | 85 | 1.1588 | 4680680 |
0.6822 | 0.2396 | 90 | 1.1598 | 4959520 |
0.5627 | 0.2529 | 95 | 1.1590 | 5237032 |
0.5604 | 0.2662 | 100 | 1.1571 | 5520816 |
0.4439 | 0.2795 | 105 | 1.1547 | 5791784 |
0.5118 | 0.2928 | 110 | 1.1562 | 6070648 |
0.5673 | 0.3062 | 115 | 1.1532 | 6350816 |
0.5077 | 0.3195 | 120 | 1.1491 | 6624856 |
0.4819 | 0.3328 | 125 | 1.1451 | 6903024 |
0.4622 | 0.3461 | 130 | 1.1461 | 7179008 |
0.5332 | 0.3594 | 135 | 1.1403 | 7459288 |
0.4536 | 0.3727 | 140 | 1.1447 | 7736168 |
0.4125 | 0.3860 | 145 | 1.1386 | 8007400 |
0.4507 | 0.3993 | 150 | 1.1381 | 8280296 |
0.4411 | 0.4126 | 155 | 1.1353 | 8563096 |
0.4867 | 0.4260 | 160 | 1.1342 | 8835744 |
0.4239 | 0.4393 | 165 | 1.1335 | 9116184 |
0.5198 | 0.4526 | 170 | 1.1308 | 9394976 |
0.502 | 0.4659 | 175 | 1.1320 | 9676488 |
0.5138 | 0.4792 | 180 | 1.1265 | 9952384 |
0.4501 | 0.4925 | 185 | 1.1288 | 10223640 |
0.4448 | 0.5058 | 190 | 1.1268 | 10503360 |
0.4864 | 0.5191 | 195 | 1.1272 | 10783504 |
0.5137 | 0.5324 | 200 | 1.1228 | 11061016 |
0.4463 | 0.5458 | 205 | 1.1251 | 11334176 |
0.5183 | 0.5591 | 210 | 1.1237 | 11611680 |
0.4873 | 0.5724 | 215 | 1.1226 | 11889528 |
0.4598 | 0.5857 | 220 | 1.1200 | 12165672 |
0.4974 | 0.5990 | 225 | 1.1180 | 12447680 |
0.307 | 0.6123 | 230 | 1.1191 | 12719352 |
0.4302 | 0.6256 | 235 | 1.1154 | 12992608 |
0.3704 | 0.6389 | 240 | 1.1187 | 13269640 |
0.43 | 0.6522 | 245 | 1.1155 | 13545056 |
0.3751 | 0.6656 | 250 | 1.1142 | 13821752 |
0.349 | 0.6789 | 255 | 1.1122 | 14096592 |
0.4908 | 0.6922 | 260 | 1.1105 | 14370976 |
0.4156 | 0.7055 | 265 | 1.1105 | 14647576 |
0.3021 | 0.7188 | 270 | 1.1102 | 14927104 |
0.4337 | 0.7321 | 275 | 1.1104 | 15202424 |
0.4187 | 0.7454 | 280 | 1.1080 | 15479160 |
0.3928 | 0.7587 | 285 | 1.1124 | 15758584 |
0.4093 | 0.7720 | 290 | 1.1058 | 16040872 |
0.474 | 0.7854 | 295 | 1.1074 | 16312664 |
0.4337 | 0.7987 | 300 | 1.1079 | 16592008 |
0.2634 | 0.8120 | 305 | 1.1057 | 16866912 |
0.3113 | 0.8253 | 310 | 1.1055 | 17146272 |
0.4897 | 0.8386 | 315 | 1.1059 | 17425624 |
0.4663 | 0.8519 | 320 | 1.1031 | 17698920 |
0.4878 | 0.8652 | 325 | 1.1059 | 17972416 |
0.3575 | 0.8785 | 330 | 1.1049 | 18246352 |
0.406 | 0.8918 | 335 | 1.1022 | 18522448 |
0.4651 | 0.9052 | 340 | 1.1042 | 18798208 |
0.4508 | 0.9185 | 345 | 1.1032 | 19069304 |
0.442 | 0.9318 | 350 | 1.1019 | 19352272 |
0.3781 | 0.9451 | 355 | 1.1029 | 19630952 |
0.4462 | 0.9584 | 360 | 1.0998 | 19903896 |
0.3345 | 0.9717 | 365 | 1.1027 | 20176392 |
0.4672 | 0.9850 | 370 | 1.1001 | 20451160 |
0.3621 | 0.9983 | 375 | 1.1004 | 20726616 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 2
Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter4_sftsd2
Base model
google/gemma-2-2b