collapse_gemma-2-2b_hs2_accumulate_iter7_sftsd2
This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.1045
- Num Input Tokens Seen: 36166336
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 2
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.3909 | 0 |
1.6049 | 0.0075 | 5 | 1.3862 | 273640 |
1.6224 | 0.0151 | 10 | 1.3404 | 554216 |
1.4024 | 0.0226 | 15 | 1.2742 | 825824 |
1.3776 | 0.0302 | 20 | 1.2246 | 1100896 |
1.2832 | 0.0377 | 25 | 1.1803 | 1379192 |
1.22 | 0.0452 | 30 | 1.1783 | 1656944 |
0.9584 | 0.0528 | 35 | 1.1731 | 1925784 |
0.8881 | 0.0603 | 40 | 1.2068 | 2192080 |
0.8391 | 0.0678 | 45 | 1.2100 | 2459864 |
0.7926 | 0.0754 | 50 | 1.2160 | 2736544 |
0.647 | 0.0829 | 55 | 1.2217 | 3005032 |
0.6438 | 0.0905 | 60 | 1.2151 | 3277256 |
0.5487 | 0.0980 | 65 | 1.2157 | 3547224 |
0.536 | 0.1055 | 70 | 1.2048 | 3817448 |
0.4943 | 0.1131 | 75 | 1.1964 | 4094432 |
0.5394 | 0.1206 | 80 | 1.1933 | 4367400 |
0.3851 | 0.1282 | 85 | 1.1909 | 4635248 |
0.4303 | 0.1357 | 90 | 1.1893 | 4903792 |
0.4199 | 0.1432 | 95 | 1.1818 | 5173464 |
0.3878 | 0.1508 | 100 | 1.1820 | 5446408 |
0.4044 | 0.1583 | 105 | 1.1846 | 5722824 |
0.3266 | 0.1658 | 110 | 1.1800 | 5998616 |
0.3367 | 0.1734 | 115 | 1.1756 | 6269328 |
0.2639 | 0.1809 | 120 | 1.1786 | 6542264 |
0.2647 | 0.1885 | 125 | 1.1753 | 6813600 |
0.3762 | 0.1960 | 130 | 1.1739 | 7087552 |
0.3209 | 0.2035 | 135 | 1.1699 | 7360376 |
0.3376 | 0.2111 | 140 | 1.1709 | 7632536 |
0.2674 | 0.2186 | 145 | 1.1719 | 7901296 |
0.2631 | 0.2262 | 150 | 1.1681 | 8167576 |
0.3092 | 0.2337 | 155 | 1.1664 | 8438360 |
0.3305 | 0.2412 | 160 | 1.1669 | 8709792 |
0.3066 | 0.2488 | 165 | 1.1607 | 8988856 |
0.2807 | 0.2563 | 170 | 1.1590 | 9265304 |
0.3085 | 0.2639 | 175 | 1.1574 | 9543928 |
0.2921 | 0.2714 | 180 | 1.1527 | 9817056 |
0.3605 | 0.2789 | 185 | 1.1557 | 10088872 |
0.2578 | 0.2865 | 190 | 1.1481 | 10360768 |
0.3511 | 0.2940 | 195 | 1.1570 | 10632016 |
0.3591 | 0.3015 | 200 | 1.1461 | 10907720 |
0.2076 | 0.3091 | 205 | 1.1540 | 11181728 |
0.3326 | 0.3166 | 210 | 1.1482 | 11460608 |
0.3914 | 0.3242 | 215 | 1.1478 | 11730288 |
0.304 | 0.3317 | 220 | 1.1487 | 12001208 |
0.3811 | 0.3392 | 225 | 1.1459 | 12272960 |
0.2744 | 0.3468 | 230 | 1.1408 | 12542408 |
0.326 | 0.3543 | 235 | 1.1443 | 12813656 |
0.3474 | 0.3619 | 240 | 1.1414 | 13084432 |
0.3346 | 0.3694 | 245 | 1.1430 | 13360240 |
0.2965 | 0.3769 | 250 | 1.1417 | 13639536 |
0.2382 | 0.3845 | 255 | 1.1373 | 13914080 |
0.2243 | 0.3920 | 260 | 1.1406 | 14189128 |
0.1954 | 0.3995 | 265 | 1.1370 | 14460672 |
0.2857 | 0.4071 | 270 | 1.1398 | 14727040 |
0.2819 | 0.4146 | 275 | 1.1351 | 15002688 |
0.2801 | 0.4222 | 280 | 1.1367 | 15275512 |
0.2907 | 0.4297 | 285 | 1.1351 | 15554848 |
0.2928 | 0.4372 | 290 | 1.1314 | 15828296 |
0.2588 | 0.4448 | 295 | 1.1358 | 16106416 |
0.2453 | 0.4523 | 300 | 1.1329 | 16381944 |
0.3333 | 0.4599 | 305 | 1.1309 | 16661632 |
0.1884 | 0.4674 | 310 | 1.1300 | 16934712 |
0.3095 | 0.4749 | 315 | 1.1309 | 17209816 |
0.2858 | 0.4825 | 320 | 1.1301 | 17484664 |
0.3195 | 0.4900 | 325 | 1.1264 | 17759488 |
0.3203 | 0.4975 | 330 | 1.1277 | 18034664 |
0.3492 | 0.5051 | 335 | 1.1266 | 18311424 |
0.3129 | 0.5126 | 340 | 1.1249 | 18584528 |
0.2546 | 0.5202 | 345 | 1.1277 | 18861208 |
0.2907 | 0.5277 | 350 | 1.1233 | 19135856 |
0.2693 | 0.5352 | 355 | 1.1235 | 19415704 |
0.2942 | 0.5428 | 360 | 1.1219 | 19685048 |
0.2393 | 0.5503 | 365 | 1.1222 | 19954816 |
0.2333 | 0.5579 | 370 | 1.1219 | 20226432 |
0.2208 | 0.5654 | 375 | 1.1232 | 20499384 |
0.2508 | 0.5729 | 380 | 1.1209 | 20779280 |
0.2002 | 0.5805 | 385 | 1.1235 | 21053584 |
0.3333 | 0.5880 | 390 | 1.1216 | 21325712 |
0.2492 | 0.5956 | 395 | 1.1233 | 21599000 |
0.2484 | 0.6031 | 400 | 1.1225 | 21871640 |
0.3439 | 0.6106 | 405 | 1.1191 | 22140448 |
0.3389 | 0.6182 | 410 | 1.1218 | 22409872 |
0.2778 | 0.6257 | 415 | 1.1197 | 22691600 |
0.2713 | 0.6332 | 420 | 1.1177 | 22961160 |
0.2169 | 0.6408 | 425 | 1.1194 | 23229808 |
0.2825 | 0.6483 | 430 | 1.1193 | 23493888 |
0.2436 | 0.6559 | 435 | 1.1170 | 23766688 |
0.3057 | 0.6634 | 440 | 1.1191 | 24038552 |
0.2639 | 0.6709 | 445 | 1.1159 | 24312808 |
0.322 | 0.6785 | 450 | 1.1162 | 24589072 |
0.1909 | 0.6860 | 455 | 1.1180 | 24855872 |
0.2823 | 0.6936 | 460 | 1.1171 | 25129120 |
0.2644 | 0.7011 | 465 | 1.1143 | 25401832 |
0.2379 | 0.7086 | 470 | 1.1151 | 25676584 |
0.2572 | 0.7162 | 475 | 1.1151 | 25946424 |
0.1768 | 0.7237 | 480 | 1.1121 | 26216712 |
0.3079 | 0.7312 | 485 | 1.1137 | 26483648 |
0.1986 | 0.7388 | 490 | 1.1112 | 26756200 |
0.2847 | 0.7463 | 495 | 1.1128 | 27024176 |
0.1732 | 0.7539 | 500 | 1.1135 | 27293512 |
0.2724 | 0.7614 | 505 | 1.1120 | 27569208 |
0.285 | 0.7689 | 510 | 1.1124 | 27836456 |
0.2303 | 0.7765 | 515 | 1.1100 | 28107632 |
0.2479 | 0.7840 | 520 | 1.1107 | 28377688 |
0.2432 | 0.7916 | 525 | 1.1109 | 28646944 |
0.3432 | 0.7991 | 530 | 1.1102 | 28922352 |
0.217 | 0.8066 | 535 | 1.1094 | 29197160 |
0.2464 | 0.8142 | 540 | 1.1099 | 29473128 |
0.3135 | 0.8217 | 545 | 1.1086 | 29746736 |
0.2532 | 0.8292 | 550 | 1.1095 | 30013224 |
0.3145 | 0.8368 | 555 | 1.1090 | 30281256 |
0.207 | 0.8443 | 560 | 1.1067 | 30549144 |
0.1811 | 0.8519 | 565 | 1.1080 | 30828416 |
0.3074 | 0.8594 | 570 | 1.1079 | 31104032 |
0.2753 | 0.8669 | 575 | 1.1048 | 31374216 |
0.155 | 0.8745 | 580 | 1.1082 | 31649384 |
0.2296 | 0.8820 | 585 | 1.1087 | 31920192 |
0.2206 | 0.8896 | 590 | 1.1057 | 32187320 |
0.2657 | 0.8971 | 595 | 1.1065 | 32463088 |
0.2821 | 0.9046 | 600 | 1.1069 | 32731832 |
0.2835 | 0.9122 | 605 | 1.1051 | 33003520 |
0.2168 | 0.9197 | 610 | 1.1063 | 33270088 |
0.2783 | 0.9273 | 615 | 1.1067 | 33542704 |
0.2993 | 0.9348 | 620 | 1.1048 | 33816144 |
0.2227 | 0.9423 | 625 | 1.1027 | 34089248 |
0.243 | 0.9499 | 630 | 1.1044 | 34359824 |
0.2575 | 0.9574 | 635 | 1.1044 | 34638264 |
0.1769 | 0.9649 | 640 | 1.1049 | 34910856 |
0.2472 | 0.9725 | 645 | 1.1055 | 35184536 |
0.2593 | 0.9800 | 650 | 1.1024 | 35455744 |
0.2254 | 0.9876 | 655 | 1.1048 | 35726536 |
0.1744 | 0.9951 | 660 | 1.1068 | 35999296 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 4
Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter7_sftsd2
Base model
google/gemma-2-2b