metadata
license: gemma
base_model: google/gemma-2-2b
tags:
- trl
- sft
- generated_from_trainer
model-index:
- name: collapse_gemma-2-2b_hs2_accumulate_iter3_sftsd0
results: []
collapse_gemma-2-2b_hs2_accumulate_iter3_sftsd0
This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.1021
- Num Input Tokens Seen: 21968712
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 0
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.3956 | 0 |
1.6085 | 0.0130 | 5 | 1.3800 | 289184 |
1.4378 | 0.0260 | 10 | 1.2933 | 571680 |
1.3575 | 0.0390 | 15 | 1.2182 | 858680 |
1.3348 | 0.0520 | 20 | 1.1684 | 1145936 |
1.1904 | 0.0650 | 25 | 1.1500 | 1437472 |
1.2228 | 0.0779 | 30 | 1.1339 | 1724288 |
1.0694 | 0.0909 | 35 | 1.1383 | 2009272 |
0.9697 | 0.1039 | 40 | 1.1630 | 2289000 |
0.9051 | 0.1169 | 45 | 1.1742 | 2569208 |
0.8855 | 0.1299 | 50 | 1.1729 | 2856576 |
0.8853 | 0.1429 | 55 | 1.1758 | 3146856 |
0.8296 | 0.1559 | 60 | 1.1816 | 3431392 |
0.7121 | 0.1689 | 65 | 1.1736 | 3726000 |
0.7528 | 0.1819 | 70 | 1.1792 | 4010080 |
0.5996 | 0.1949 | 75 | 1.1802 | 4295264 |
0.6437 | 0.2079 | 80 | 1.1785 | 4576256 |
0.6683 | 0.2209 | 85 | 1.1733 | 4869384 |
0.5115 | 0.2338 | 90 | 1.1750 | 5151776 |
0.545 | 0.2468 | 95 | 1.1701 | 5443960 |
0.5348 | 0.2598 | 100 | 1.1673 | 5728368 |
0.5687 | 0.2728 | 105 | 1.1641 | 6017560 |
0.4856 | 0.2858 | 110 | 1.1663 | 6300000 |
0.4691 | 0.2988 | 115 | 1.1630 | 6586672 |
0.4454 | 0.3118 | 120 | 1.1585 | 6869504 |
0.5734 | 0.3248 | 125 | 1.1606 | 7159680 |
0.4317 | 0.3378 | 130 | 1.1529 | 7437936 |
0.4603 | 0.3508 | 135 | 1.1541 | 7727120 |
0.5264 | 0.3638 | 140 | 1.1542 | 8013352 |
0.5051 | 0.3767 | 145 | 1.1493 | 8302848 |
0.397 | 0.3897 | 150 | 1.1528 | 8588472 |
0.4173 | 0.4027 | 155 | 1.1463 | 8876960 |
0.3443 | 0.4157 | 160 | 1.1474 | 9156600 |
0.4343 | 0.4287 | 165 | 1.1455 | 9440520 |
0.4683 | 0.4417 | 170 | 1.1431 | 9726600 |
0.4732 | 0.4547 | 175 | 1.1408 | 10009248 |
0.4876 | 0.4677 | 180 | 1.1414 | 10297320 |
0.4574 | 0.4807 | 185 | 1.1369 | 10582704 |
0.4038 | 0.4937 | 190 | 1.1354 | 10870648 |
0.4239 | 0.5067 | 195 | 1.1355 | 11148576 |
0.5262 | 0.5196 | 200 | 1.1291 | 11436464 |
0.4788 | 0.5326 | 205 | 1.1322 | 11721416 |
0.3975 | 0.5456 | 210 | 1.1276 | 12012696 |
0.3807 | 0.5586 | 215 | 1.1310 | 12299376 |
0.4784 | 0.5716 | 220 | 1.1232 | 12594368 |
0.4 | 0.5846 | 225 | 1.1272 | 12880616 |
0.4511 | 0.5976 | 230 | 1.1229 | 13164112 |
0.4119 | 0.6106 | 235 | 1.1234 | 13446016 |
0.3515 | 0.6236 | 240 | 1.1224 | 13729688 |
0.3695 | 0.6366 | 245 | 1.1201 | 14015064 |
0.387 | 0.6496 | 250 | 1.1190 | 14303192 |
0.4503 | 0.6626 | 255 | 1.1167 | 14587200 |
0.3205 | 0.6755 | 260 | 1.1184 | 14875032 |
0.3369 | 0.6885 | 265 | 1.1154 | 15159592 |
0.46 | 0.7015 | 270 | 1.1173 | 15443480 |
0.4148 | 0.7145 | 275 | 1.1121 | 15737624 |
0.4251 | 0.7275 | 280 | 1.1141 | 16021928 |
0.3786 | 0.7405 | 285 | 1.1126 | 16306944 |
0.3593 | 0.7535 | 290 | 1.1114 | 16592904 |
0.4698 | 0.7665 | 295 | 1.1114 | 16875744 |
0.3327 | 0.7795 | 300 | 1.1098 | 17163408 |
0.3521 | 0.7925 | 305 | 1.1125 | 17451024 |
0.3682 | 0.8055 | 310 | 1.1076 | 17741680 |
0.3266 | 0.8184 | 315 | 1.1098 | 18022800 |
0.3986 | 0.8314 | 320 | 1.1078 | 18298600 |
0.3869 | 0.8444 | 325 | 1.1078 | 18585288 |
0.3904 | 0.8574 | 330 | 1.1072 | 18870912 |
0.361 | 0.8704 | 335 | 1.1070 | 19165960 |
0.4643 | 0.8834 | 340 | 1.1047 | 19458704 |
0.4603 | 0.8964 | 345 | 1.1048 | 19741152 |
0.4815 | 0.9094 | 350 | 1.1053 | 20029752 |
0.3097 | 0.9224 | 355 | 1.1050 | 20317240 |
0.3686 | 0.9354 | 360 | 1.1033 | 20601320 |
0.485 | 0.9484 | 365 | 1.1042 | 20895904 |
0.3946 | 0.9614 | 370 | 1.1014 | 21179672 |
0.4621 | 0.9743 | 375 | 1.1032 | 21460376 |
0.4748 | 0.9873 | 380 | 1.1025 | 21737656 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1