collapse_gemma-2-2b_hs2_accumulate_iter5_sftsd2
This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.1017
- Num Input Tokens Seen: 25730952
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 2
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.3909 | 0 |
1.5411 | 0.0106 | 5 | 1.3783 | 272168 |
1.5918 | 0.0213 | 10 | 1.3022 | 547248 |
1.3917 | 0.0319 | 15 | 1.2334 | 827136 |
1.2769 | 0.0425 | 20 | 1.1817 | 1108760 |
1.1975 | 0.0532 | 25 | 1.1698 | 1382240 |
1.037 | 0.0638 | 30 | 1.1491 | 1659392 |
0.9364 | 0.0744 | 35 | 1.1711 | 1933320 |
0.8512 | 0.0851 | 40 | 1.1805 | 2217472 |
0.8118 | 0.0957 | 45 | 1.1890 | 2494616 |
0.7426 | 0.1063 | 50 | 1.1920 | 2767552 |
0.687 | 0.1170 | 55 | 1.1935 | 3030400 |
0.6747 | 0.1276 | 60 | 1.1881 | 3301288 |
0.6189 | 0.1382 | 65 | 1.1822 | 3574336 |
0.6121 | 0.1489 | 70 | 1.1785 | 3843792 |
0.5065 | 0.1595 | 75 | 1.1724 | 4118648 |
0.5733 | 0.1701 | 80 | 1.1710 | 4387800 |
0.5961 | 0.1808 | 85 | 1.1766 | 4659672 |
0.5097 | 0.1914 | 90 | 1.1727 | 4933736 |
0.4812 | 0.2020 | 95 | 1.1689 | 5213232 |
0.4241 | 0.2127 | 100 | 1.1730 | 5484456 |
0.5009 | 0.2233 | 105 | 1.1617 | 5759048 |
0.4416 | 0.2339 | 110 | 1.1703 | 6035320 |
0.4452 | 0.2446 | 115 | 1.1592 | 6306832 |
0.3983 | 0.2552 | 120 | 1.1651 | 6575048 |
0.4051 | 0.2658 | 125 | 1.1574 | 6846416 |
0.4605 | 0.2764 | 130 | 1.1602 | 7119824 |
0.3852 | 0.2871 | 135 | 1.1570 | 7399680 |
0.4569 | 0.2977 | 140 | 1.1494 | 7679448 |
0.3371 | 0.3083 | 145 | 1.1536 | 7948392 |
0.4216 | 0.3190 | 150 | 1.1492 | 8221992 |
0.4162 | 0.3296 | 155 | 1.1495 | 8497688 |
0.4242 | 0.3402 | 160 | 1.1470 | 8769288 |
0.5207 | 0.3509 | 165 | 1.1482 | 9040440 |
0.5184 | 0.3615 | 170 | 1.1438 | 9303304 |
0.4073 | 0.3721 | 175 | 1.1446 | 9579608 |
0.5278 | 0.3828 | 180 | 1.1419 | 9852200 |
0.3397 | 0.3934 | 185 | 1.1405 | 10120216 |
0.3696 | 0.4040 | 190 | 1.1374 | 10399376 |
0.4079 | 0.4147 | 195 | 1.1387 | 10669696 |
0.3999 | 0.4253 | 200 | 1.1354 | 10945120 |
0.3623 | 0.4359 | 205 | 1.1349 | 11217216 |
0.3865 | 0.4466 | 210 | 1.1345 | 11490240 |
0.3609 | 0.4572 | 215 | 1.1319 | 11764136 |
0.329 | 0.4678 | 220 | 1.1320 | 12035936 |
0.318 | 0.4785 | 225 | 1.1304 | 12309960 |
0.3688 | 0.4891 | 230 | 1.1303 | 12587360 |
0.3825 | 0.4997 | 235 | 1.1296 | 12864056 |
0.3342 | 0.5104 | 240 | 1.1266 | 13141392 |
0.3556 | 0.5210 | 245 | 1.1297 | 13409248 |
0.3922 | 0.5316 | 250 | 1.1232 | 13685608 |
0.2913 | 0.5423 | 255 | 1.1275 | 13960768 |
0.2877 | 0.5529 | 260 | 1.1267 | 14229912 |
0.3073 | 0.5635 | 265 | 1.1215 | 14504880 |
0.3047 | 0.5742 | 270 | 1.1249 | 14781040 |
0.3112 | 0.5848 | 275 | 1.1212 | 15052056 |
0.3715 | 0.5954 | 280 | 1.1204 | 15331080 |
0.3126 | 0.6061 | 285 | 1.1210 | 15594416 |
0.2426 | 0.6167 | 290 | 1.1199 | 15871488 |
0.3172 | 0.6273 | 295 | 1.1201 | 16148664 |
0.3546 | 0.6380 | 300 | 1.1180 | 16420880 |
0.3447 | 0.6486 | 305 | 1.1167 | 16691672 |
0.3834 | 0.6592 | 310 | 1.1152 | 16963912 |
0.3802 | 0.6699 | 315 | 1.1149 | 17234816 |
0.4121 | 0.6805 | 320 | 1.1133 | 17507216 |
0.3417 | 0.6911 | 325 | 1.1138 | 17782816 |
0.3381 | 0.7018 | 330 | 1.1137 | 18051064 |
0.3219 | 0.7124 | 335 | 1.1119 | 18317872 |
0.3273 | 0.7230 | 340 | 1.1115 | 18592672 |
0.382 | 0.7337 | 345 | 1.1110 | 18868536 |
0.2966 | 0.7443 | 350 | 1.1109 | 19141216 |
0.3398 | 0.7549 | 355 | 1.1137 | 19414104 |
0.3522 | 0.7656 | 360 | 1.1101 | 19690832 |
0.2731 | 0.7762 | 365 | 1.1126 | 19964864 |
0.4028 | 0.7868 | 370 | 1.1089 | 20238104 |
0.3434 | 0.7974 | 375 | 1.1078 | 20510528 |
0.3365 | 0.8081 | 380 | 1.1091 | 20788304 |
0.3795 | 0.8187 | 385 | 1.1099 | 21068592 |
0.3514 | 0.8293 | 390 | 1.1061 | 21347680 |
0.3104 | 0.8400 | 395 | 1.1073 | 21620912 |
0.2955 | 0.8506 | 400 | 1.1061 | 21895216 |
0.3423 | 0.8612 | 405 | 1.1049 | 22169448 |
0.3246 | 0.8719 | 410 | 1.1072 | 22443272 |
0.3157 | 0.8825 | 415 | 1.1059 | 22717032 |
0.3253 | 0.8931 | 420 | 1.1058 | 22985352 |
0.4123 | 0.9038 | 425 | 1.1068 | 23257848 |
0.2308 | 0.9144 | 430 | 1.1055 | 23530088 |
0.3211 | 0.9250 | 435 | 1.1055 | 23802936 |
0.3404 | 0.9357 | 440 | 1.1038 | 24081728 |
0.2566 | 0.9463 | 445 | 1.1033 | 24356968 |
0.3221 | 0.9569 | 450 | 1.1028 | 24630208 |
0.3999 | 0.9676 | 455 | 1.1022 | 24903936 |
0.3544 | 0.9782 | 460 | 1.1022 | 25182688 |
0.287 | 0.9888 | 465 | 1.1035 | 25458936 |
0.2694 | 0.9995 | 470 | 1.1017 | 25730952 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 8
Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter5_sftsd2
Base model
google/gemma-2-2b