collapse_gemma-2-2b_hs2_accumulate_iter7_sftsd0
This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.0939
- Num Input Tokens Seen: 36687080
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 0
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.3909 | 0 |
1.5743 | 0.0076 | 5 | 1.3850 | 286024 |
1.5698 | 0.0152 | 10 | 1.3359 | 565176 |
1.5023 | 0.0227 | 15 | 1.2721 | 843224 |
1.3784 | 0.0303 | 20 | 1.2210 | 1128808 |
1.1853 | 0.0379 | 25 | 1.1834 | 1409632 |
1.079 | 0.0455 | 30 | 1.1911 | 1688000 |
0.9274 | 0.0531 | 35 | 1.2022 | 1961576 |
0.8275 | 0.0607 | 40 | 1.2078 | 2242896 |
0.6817 | 0.0682 | 45 | 1.2485 | 2524032 |
0.5892 | 0.0758 | 50 | 1.2344 | 2801792 |
0.4418 | 0.0834 | 55 | 1.2415 | 3078040 |
0.4992 | 0.0910 | 60 | 1.1980 | 3358368 |
0.4529 | 0.0986 | 65 | 1.2040 | 3643320 |
0.4315 | 0.1062 | 70 | 1.2063 | 3920184 |
0.3633 | 0.1137 | 75 | 1.1887 | 4195744 |
0.3498 | 0.1213 | 80 | 1.1900 | 4474088 |
0.5205 | 0.1289 | 85 | 1.1810 | 4750552 |
0.4456 | 0.1365 | 90 | 1.1784 | 5033120 |
0.2259 | 0.1441 | 95 | 1.1689 | 5308224 |
0.2957 | 0.1517 | 100 | 1.1673 | 5584192 |
0.2861 | 0.1592 | 105 | 1.1622 | 5855384 |
0.396 | 0.1668 | 110 | 1.1576 | 6135472 |
0.2727 | 0.1744 | 115 | 1.1593 | 6417808 |
0.2863 | 0.1820 | 120 | 1.1536 | 6694768 |
0.3506 | 0.1896 | 125 | 1.1512 | 6974920 |
0.3593 | 0.1972 | 130 | 1.1506 | 7250952 |
0.3129 | 0.2047 | 135 | 1.1464 | 7528424 |
0.305 | 0.2123 | 140 | 1.1471 | 7796288 |
0.2969 | 0.2199 | 145 | 1.1458 | 8071736 |
0.3828 | 0.2275 | 150 | 1.1450 | 8354136 |
0.2908 | 0.2351 | 155 | 1.1426 | 8627856 |
0.3691 | 0.2427 | 160 | 1.1403 | 8906272 |
0.248 | 0.2502 | 165 | 1.1434 | 9190272 |
0.2853 | 0.2578 | 170 | 1.1398 | 9467688 |
0.336 | 0.2654 | 175 | 1.1423 | 9745264 |
0.2295 | 0.2730 | 180 | 1.1392 | 10022808 |
0.2522 | 0.2806 | 185 | 1.1382 | 10307056 |
0.2513 | 0.2882 | 190 | 1.1442 | 10582992 |
0.2799 | 0.2957 | 195 | 1.1370 | 10866240 |
0.2176 | 0.3033 | 200 | 1.1359 | 11148368 |
0.293 | 0.3109 | 205 | 1.1353 | 11433232 |
0.3076 | 0.3185 | 210 | 1.1317 | 11705656 |
0.2469 | 0.3261 | 215 | 1.1337 | 11983632 |
0.3734 | 0.3336 | 220 | 1.1323 | 12266112 |
0.2704 | 0.3412 | 225 | 1.1290 | 12547976 |
0.3469 | 0.3488 | 230 | 1.1300 | 12824592 |
0.3266 | 0.3564 | 235 | 1.1280 | 13098760 |
0.2528 | 0.3640 | 240 | 1.1268 | 13368616 |
0.2867 | 0.3716 | 245 | 1.1266 | 13650008 |
0.228 | 0.3791 | 250 | 1.1262 | 13927240 |
0.233 | 0.3867 | 255 | 1.1249 | 14203184 |
0.2724 | 0.3943 | 260 | 1.1250 | 14475384 |
0.2117 | 0.4019 | 265 | 1.1245 | 14760384 |
0.1981 | 0.4095 | 270 | 1.1226 | 15040960 |
0.2519 | 0.4171 | 275 | 1.1219 | 15323064 |
0.4068 | 0.4246 | 280 | 1.1205 | 15603904 |
0.2811 | 0.4322 | 285 | 1.1214 | 15883608 |
0.259 | 0.4398 | 290 | 1.1201 | 16159520 |
0.2938 | 0.4474 | 295 | 1.1208 | 16437656 |
0.2466 | 0.4550 | 300 | 1.1214 | 16716952 |
0.2997 | 0.4626 | 305 | 1.1162 | 16992344 |
0.2268 | 0.4701 | 310 | 1.1229 | 17268760 |
0.343 | 0.4777 | 315 | 1.1172 | 17547648 |
0.2424 | 0.4853 | 320 | 1.1154 | 17828288 |
0.2849 | 0.4929 | 325 | 1.1172 | 18107576 |
0.478 | 0.5005 | 330 | 1.1155 | 18387728 |
0.1959 | 0.5081 | 335 | 1.1162 | 18667088 |
0.1868 | 0.5156 | 340 | 1.1160 | 18950480 |
0.234 | 0.5232 | 345 | 1.1150 | 19228760 |
0.2519 | 0.5308 | 350 | 1.1135 | 19508952 |
0.2625 | 0.5384 | 355 | 1.1145 | 19787448 |
0.3843 | 0.5460 | 360 | 1.1109 | 20073168 |
0.3005 | 0.5536 | 365 | 1.1109 | 20343008 |
0.1833 | 0.5611 | 370 | 1.1110 | 20623352 |
0.2446 | 0.5687 | 375 | 1.1093 | 20901240 |
0.25 | 0.5763 | 380 | 1.1104 | 21185296 |
0.2897 | 0.5839 | 385 | 1.1103 | 21464672 |
0.168 | 0.5915 | 390 | 1.1099 | 21743520 |
0.2387 | 0.5991 | 395 | 1.1106 | 22023544 |
0.2066 | 0.6066 | 400 | 1.1072 | 22291944 |
0.2191 | 0.6142 | 405 | 1.1089 | 22572096 |
0.1869 | 0.6218 | 410 | 1.1085 | 22849472 |
0.1939 | 0.6294 | 415 | 1.1075 | 23126440 |
0.2368 | 0.6370 | 420 | 1.1091 | 23406096 |
0.2209 | 0.6445 | 425 | 1.1066 | 23678072 |
0.2523 | 0.6521 | 430 | 1.1077 | 23961192 |
0.2416 | 0.6597 | 435 | 1.1082 | 24240520 |
0.1964 | 0.6673 | 440 | 1.1057 | 24520856 |
0.2369 | 0.6749 | 445 | 1.1055 | 24798288 |
0.23 | 0.6825 | 450 | 1.1074 | 25075848 |
0.2349 | 0.6900 | 455 | 1.1046 | 25344112 |
0.243 | 0.6976 | 460 | 1.1063 | 25625216 |
0.3343 | 0.7052 | 465 | 1.1066 | 25901904 |
0.2341 | 0.7128 | 470 | 1.1042 | 26177128 |
0.283 | 0.7204 | 475 | 1.1059 | 26459400 |
0.3112 | 0.7280 | 480 | 1.1066 | 26736784 |
0.3015 | 0.7355 | 485 | 1.1042 | 27017152 |
0.2788 | 0.7431 | 490 | 1.1031 | 27295048 |
0.1838 | 0.7507 | 495 | 1.1025 | 27575392 |
0.2366 | 0.7583 | 500 | 1.1036 | 27852328 |
0.297 | 0.7659 | 505 | 1.1032 | 28130032 |
0.1622 | 0.7735 | 510 | 1.1015 | 28407672 |
0.165 | 0.7810 | 515 | 1.1012 | 28680696 |
0.3047 | 0.7886 | 520 | 1.1010 | 28957216 |
0.336 | 0.7962 | 525 | 1.1012 | 29235048 |
0.2728 | 0.8038 | 530 | 1.1011 | 29507352 |
0.2007 | 0.8114 | 535 | 1.1008 | 29778208 |
0.2253 | 0.8190 | 540 | 1.1013 | 30055416 |
0.2386 | 0.8265 | 545 | 1.0982 | 30333728 |
0.2056 | 0.8341 | 550 | 1.0989 | 30599088 |
0.2879 | 0.8417 | 555 | 1.1003 | 30883072 |
0.2207 | 0.8493 | 560 | 1.0993 | 31160232 |
0.2821 | 0.8569 | 565 | 1.0979 | 31441272 |
0.2246 | 0.8645 | 570 | 1.0982 | 31712696 |
0.3249 | 0.8720 | 575 | 1.0980 | 31991400 |
0.2616 | 0.8796 | 580 | 1.0985 | 32269224 |
0.2716 | 0.8872 | 585 | 1.0997 | 32542384 |
0.2898 | 0.8948 | 590 | 1.0979 | 32826016 |
0.2617 | 0.9024 | 595 | 1.0968 | 33110848 |
0.2057 | 0.9100 | 600 | 1.0988 | 33391352 |
0.293 | 0.9175 | 605 | 1.0965 | 33670472 |
0.2081 | 0.9251 | 610 | 1.0947 | 33950936 |
0.2801 | 0.9327 | 615 | 1.0963 | 34226952 |
0.2678 | 0.9403 | 620 | 1.0952 | 34502376 |
0.222 | 0.9479 | 625 | 1.0944 | 34774480 |
0.2561 | 0.9555 | 630 | 1.0944 | 35057720 |
0.2738 | 0.9630 | 635 | 1.0947 | 35333096 |
0.182 | 0.9706 | 640 | 1.0947 | 35614552 |
0.224 | 0.9782 | 645 | 1.0935 | 35890992 |
0.2861 | 0.9858 | 650 | 1.0935 | 36177736 |
0.2674 | 0.9934 | 655 | 1.0948 | 36462944 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 1
Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter7_sftsd0
Base model
google/gemma-2-2b