collapse_gemma-2-2b_hs2_accumulate_iter5_sftsd1
This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.0909
- Num Input Tokens Seen: 25913464
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 1
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.3909 | 0 |
1.5757 | 0.0106 | 5 | 1.3785 | 273728 |
1.5086 | 0.0212 | 10 | 1.3024 | 553176 |
1.3703 | 0.0318 | 15 | 1.2301 | 832928 |
1.2237 | 0.0424 | 20 | 1.1798 | 1112400 |
1.1043 | 0.0530 | 25 | 1.1741 | 1387256 |
0.8871 | 0.0636 | 30 | 1.1676 | 1667816 |
0.8128 | 0.0742 | 35 | 1.1807 | 1935720 |
0.8159 | 0.0848 | 40 | 1.1931 | 2212104 |
0.7139 | 0.0955 | 45 | 1.2108 | 2488864 |
0.6054 | 0.1061 | 50 | 1.1934 | 2759968 |
0.5794 | 0.1167 | 55 | 1.1874 | 3037768 |
0.4857 | 0.1273 | 60 | 1.1861 | 3315040 |
0.5228 | 0.1379 | 65 | 1.1744 | 3590680 |
0.5009 | 0.1485 | 70 | 1.1665 | 3866264 |
0.4853 | 0.1591 | 75 | 1.1741 | 4138640 |
0.4493 | 0.1697 | 80 | 1.1581 | 4408560 |
0.4206 | 0.1803 | 85 | 1.1612 | 4676520 |
0.3377 | 0.1909 | 90 | 1.1532 | 4956920 |
0.3708 | 0.2015 | 95 | 1.1524 | 5230480 |
0.4861 | 0.2121 | 100 | 1.1467 | 5510432 |
0.415 | 0.2227 | 105 | 1.1487 | 5783888 |
0.3656 | 0.2333 | 110 | 1.1439 | 6059904 |
0.4284 | 0.2439 | 115 | 1.1477 | 6333552 |
0.3727 | 0.2545 | 120 | 1.1430 | 6607432 |
0.4572 | 0.2651 | 125 | 1.1448 | 6884048 |
0.3842 | 0.2758 | 130 | 1.1388 | 7161200 |
0.3452 | 0.2864 | 135 | 1.1418 | 7443528 |
0.3085 | 0.2970 | 140 | 1.1353 | 7719360 |
0.4154 | 0.3076 | 145 | 1.1353 | 8001024 |
0.3739 | 0.3182 | 150 | 1.1316 | 8281392 |
0.3435 | 0.3288 | 155 | 1.1313 | 8553600 |
0.356 | 0.3394 | 160 | 1.1337 | 8825544 |
0.3751 | 0.3500 | 165 | 1.1262 | 9098040 |
0.3788 | 0.3606 | 170 | 1.1268 | 9377472 |
0.3203 | 0.3712 | 175 | 1.1266 | 9649408 |
0.3023 | 0.3818 | 180 | 1.1224 | 9930488 |
0.3961 | 0.3924 | 185 | 1.1217 | 10204672 |
0.4728 | 0.4030 | 190 | 1.1191 | 10476840 |
0.3212 | 0.4136 | 195 | 1.1211 | 10748672 |
0.3261 | 0.4242 | 200 | 1.1176 | 11022304 |
0.2691 | 0.4348 | 205 | 1.1170 | 11294832 |
0.2953 | 0.4454 | 210 | 1.1151 | 11571256 |
0.3242 | 0.4561 | 215 | 1.1162 | 11845312 |
0.3608 | 0.4667 | 220 | 1.1142 | 12124880 |
0.3344 | 0.4773 | 225 | 1.1133 | 12396192 |
0.2966 | 0.4879 | 230 | 1.1142 | 12663864 |
0.3665 | 0.4985 | 235 | 1.1141 | 12938920 |
0.3217 | 0.5091 | 240 | 1.1155 | 13209424 |
0.3376 | 0.5197 | 245 | 1.1119 | 13482760 |
0.3636 | 0.5303 | 250 | 1.1130 | 13749552 |
0.3988 | 0.5409 | 255 | 1.1115 | 14022304 |
0.361 | 0.5515 | 260 | 1.1087 | 14298840 |
0.3727 | 0.5621 | 265 | 1.1117 | 14569648 |
0.3881 | 0.5727 | 270 | 1.1083 | 14844120 |
0.324 | 0.5833 | 275 | 1.1086 | 15119496 |
0.4137 | 0.5939 | 280 | 1.1079 | 15395456 |
0.4208 | 0.6045 | 285 | 1.1058 | 15671704 |
0.2808 | 0.6151 | 290 | 1.1065 | 15944040 |
0.2928 | 0.6257 | 295 | 1.1055 | 16220520 |
0.4027 | 0.6364 | 300 | 1.1075 | 16491504 |
0.2943 | 0.6470 | 305 | 1.1053 | 16765024 |
0.3012 | 0.6576 | 310 | 1.1059 | 17039080 |
0.2789 | 0.6682 | 315 | 1.1039 | 17318648 |
0.3305 | 0.6788 | 320 | 1.1030 | 17596848 |
0.321 | 0.6894 | 325 | 1.1018 | 17870976 |
0.3127 | 0.7000 | 330 | 1.1039 | 18137760 |
0.3792 | 0.7106 | 335 | 1.1030 | 18410248 |
0.3946 | 0.7212 | 340 | 1.0999 | 18677968 |
0.334 | 0.7318 | 345 | 1.1031 | 18947432 |
0.3146 | 0.7424 | 350 | 1.1030 | 19227968 |
0.3158 | 0.7530 | 355 | 1.0988 | 19509360 |
0.2907 | 0.7636 | 360 | 1.1000 | 19785616 |
0.4204 | 0.7742 | 365 | 1.1001 | 20056848 |
0.2924 | 0.7848 | 370 | 1.1002 | 20335856 |
0.3222 | 0.7954 | 375 | 1.0997 | 20613064 |
0.3221 | 0.8060 | 380 | 1.0989 | 20884992 |
0.3005 | 0.8167 | 385 | 1.0967 | 21162232 |
0.3183 | 0.8273 | 390 | 1.0968 | 21438576 |
0.3396 | 0.8379 | 395 | 1.0980 | 21715544 |
0.3205 | 0.8485 | 400 | 1.0947 | 21988384 |
0.3199 | 0.8591 | 405 | 1.0972 | 22266120 |
0.314 | 0.8697 | 410 | 1.0939 | 22539560 |
0.4633 | 0.8803 | 415 | 1.0941 | 22813776 |
0.3282 | 0.8909 | 420 | 1.0940 | 23090296 |
0.3576 | 0.9015 | 425 | 1.0933 | 23369344 |
0.3411 | 0.9121 | 430 | 1.0934 | 23645208 |
0.2557 | 0.9227 | 435 | 1.0935 | 23919016 |
0.4153 | 0.9333 | 440 | 1.0922 | 24194664 |
0.3082 | 0.9439 | 445 | 1.0929 | 24470512 |
0.2994 | 0.9545 | 450 | 1.0925 | 24748488 |
0.2968 | 0.9651 | 455 | 1.0915 | 25029504 |
0.3045 | 0.9757 | 460 | 1.0936 | 25307368 |
0.273 | 0.9863 | 465 | 1.0917 | 25584672 |
0.3096 | 0.9970 | 470 | 1.0909 | 25862576 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 3
Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter5_sftsd1
Base model
google/gemma-2-2b