|
--- |
|
license: gemma |
|
base_model: google/gemma-2-2b |
|
tags: |
|
- trl |
|
- sft |
|
- generated_from_trainer |
|
model-index: |
|
- name: collapse_gemma-2-2b_hs2_accumulate_iter3_sftsd0 |
|
results: [] |
|
--- |
|
|
|
<!-- This model card has been generated automatically according to the information the Trainer had access to. You |
|
should probably proofread and complete it, then remove this comment. --> |
|
|
|
# collapse_gemma-2-2b_hs2_accumulate_iter3_sftsd0 |
|
|
|
This model is a fine-tuned version of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) on an unknown dataset. |
|
It achieves the following results on the evaluation set: |
|
- Loss: 1.1021 |
|
- Num Input Tokens Seen: 21968712 |
|
|
|
## Model description |
|
|
|
More information needed |
|
|
|
## Intended uses & limitations |
|
|
|
More information needed |
|
|
|
## Training and evaluation data |
|
|
|
More information needed |
|
|
|
## Training procedure |
|
|
|
### Training hyperparameters |
|
|
|
The following hyperparameters were used during training: |
|
- learning_rate: 8e-06 |
|
- train_batch_size: 8 |
|
- eval_batch_size: 16 |
|
- seed: 0 |
|
- gradient_accumulation_steps: 16 |
|
- total_train_batch_size: 128 |
|
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 |
|
- lr_scheduler_type: constant_with_warmup |
|
- lr_scheduler_warmup_ratio: 0.05 |
|
- num_epochs: 1 |
|
|
|
### Training results |
|
|
|
| Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen | |
|
|:-------------:|:------:|:----:|:---------------:|:-----------------:| |
|
| No log | 0 | 0 | 1.3956 | 0 | |
|
| 1.6085 | 0.0130 | 5 | 1.3800 | 289184 | |
|
| 1.4378 | 0.0260 | 10 | 1.2933 | 571680 | |
|
| 1.3575 | 0.0390 | 15 | 1.2182 | 858680 | |
|
| 1.3348 | 0.0520 | 20 | 1.1684 | 1145936 | |
|
| 1.1904 | 0.0650 | 25 | 1.1500 | 1437472 | |
|
| 1.2228 | 0.0779 | 30 | 1.1339 | 1724288 | |
|
| 1.0694 | 0.0909 | 35 | 1.1383 | 2009272 | |
|
| 0.9697 | 0.1039 | 40 | 1.1630 | 2289000 | |
|
| 0.9051 | 0.1169 | 45 | 1.1742 | 2569208 | |
|
| 0.8855 | 0.1299 | 50 | 1.1729 | 2856576 | |
|
| 0.8853 | 0.1429 | 55 | 1.1758 | 3146856 | |
|
| 0.8296 | 0.1559 | 60 | 1.1816 | 3431392 | |
|
| 0.7121 | 0.1689 | 65 | 1.1736 | 3726000 | |
|
| 0.7528 | 0.1819 | 70 | 1.1792 | 4010080 | |
|
| 0.5996 | 0.1949 | 75 | 1.1802 | 4295264 | |
|
| 0.6437 | 0.2079 | 80 | 1.1785 | 4576256 | |
|
| 0.6683 | 0.2209 | 85 | 1.1733 | 4869384 | |
|
| 0.5115 | 0.2338 | 90 | 1.1750 | 5151776 | |
|
| 0.545 | 0.2468 | 95 | 1.1701 | 5443960 | |
|
| 0.5348 | 0.2598 | 100 | 1.1673 | 5728368 | |
|
| 0.5687 | 0.2728 | 105 | 1.1641 | 6017560 | |
|
| 0.4856 | 0.2858 | 110 | 1.1663 | 6300000 | |
|
| 0.4691 | 0.2988 | 115 | 1.1630 | 6586672 | |
|
| 0.4454 | 0.3118 | 120 | 1.1585 | 6869504 | |
|
| 0.5734 | 0.3248 | 125 | 1.1606 | 7159680 | |
|
| 0.4317 | 0.3378 | 130 | 1.1529 | 7437936 | |
|
| 0.4603 | 0.3508 | 135 | 1.1541 | 7727120 | |
|
| 0.5264 | 0.3638 | 140 | 1.1542 | 8013352 | |
|
| 0.5051 | 0.3767 | 145 | 1.1493 | 8302848 | |
|
| 0.397 | 0.3897 | 150 | 1.1528 | 8588472 | |
|
| 0.4173 | 0.4027 | 155 | 1.1463 | 8876960 | |
|
| 0.3443 | 0.4157 | 160 | 1.1474 | 9156600 | |
|
| 0.4343 | 0.4287 | 165 | 1.1455 | 9440520 | |
|
| 0.4683 | 0.4417 | 170 | 1.1431 | 9726600 | |
|
| 0.4732 | 0.4547 | 175 | 1.1408 | 10009248 | |
|
| 0.4876 | 0.4677 | 180 | 1.1414 | 10297320 | |
|
| 0.4574 | 0.4807 | 185 | 1.1369 | 10582704 | |
|
| 0.4038 | 0.4937 | 190 | 1.1354 | 10870648 | |
|
| 0.4239 | 0.5067 | 195 | 1.1355 | 11148576 | |
|
| 0.5262 | 0.5196 | 200 | 1.1291 | 11436464 | |
|
| 0.4788 | 0.5326 | 205 | 1.1322 | 11721416 | |
|
| 0.3975 | 0.5456 | 210 | 1.1276 | 12012696 | |
|
| 0.3807 | 0.5586 | 215 | 1.1310 | 12299376 | |
|
| 0.4784 | 0.5716 | 220 | 1.1232 | 12594368 | |
|
| 0.4 | 0.5846 | 225 | 1.1272 | 12880616 | |
|
| 0.4511 | 0.5976 | 230 | 1.1229 | 13164112 | |
|
| 0.4119 | 0.6106 | 235 | 1.1234 | 13446016 | |
|
| 0.3515 | 0.6236 | 240 | 1.1224 | 13729688 | |
|
| 0.3695 | 0.6366 | 245 | 1.1201 | 14015064 | |
|
| 0.387 | 0.6496 | 250 | 1.1190 | 14303192 | |
|
| 0.4503 | 0.6626 | 255 | 1.1167 | 14587200 | |
|
| 0.3205 | 0.6755 | 260 | 1.1184 | 14875032 | |
|
| 0.3369 | 0.6885 | 265 | 1.1154 | 15159592 | |
|
| 0.46 | 0.7015 | 270 | 1.1173 | 15443480 | |
|
| 0.4148 | 0.7145 | 275 | 1.1121 | 15737624 | |
|
| 0.4251 | 0.7275 | 280 | 1.1141 | 16021928 | |
|
| 0.3786 | 0.7405 | 285 | 1.1126 | 16306944 | |
|
| 0.3593 | 0.7535 | 290 | 1.1114 | 16592904 | |
|
| 0.4698 | 0.7665 | 295 | 1.1114 | 16875744 | |
|
| 0.3327 | 0.7795 | 300 | 1.1098 | 17163408 | |
|
| 0.3521 | 0.7925 | 305 | 1.1125 | 17451024 | |
|
| 0.3682 | 0.8055 | 310 | 1.1076 | 17741680 | |
|
| 0.3266 | 0.8184 | 315 | 1.1098 | 18022800 | |
|
| 0.3986 | 0.8314 | 320 | 1.1078 | 18298600 | |
|
| 0.3869 | 0.8444 | 325 | 1.1078 | 18585288 | |
|
| 0.3904 | 0.8574 | 330 | 1.1072 | 18870912 | |
|
| 0.361 | 0.8704 | 335 | 1.1070 | 19165960 | |
|
| 0.4643 | 0.8834 | 340 | 1.1047 | 19458704 | |
|
| 0.4603 | 0.8964 | 345 | 1.1048 | 19741152 | |
|
| 0.4815 | 0.9094 | 350 | 1.1053 | 20029752 | |
|
| 0.3097 | 0.9224 | 355 | 1.1050 | 20317240 | |
|
| 0.3686 | 0.9354 | 360 | 1.1033 | 20601320 | |
|
| 0.485 | 0.9484 | 365 | 1.1042 | 20895904 | |
|
| 0.3946 | 0.9614 | 370 | 1.1014 | 21179672 | |
|
| 0.4621 | 0.9743 | 375 | 1.1032 | 21460376 | |
|
| 0.4748 | 0.9873 | 380 | 1.1025 | 21737656 | |
|
|
|
|
|
### Framework versions |
|
|
|
- Transformers 4.44.0 |
|
- Pytorch 2.4.0+cu121 |
|
- Datasets 2.20.0 |
|
- Tokenizers 0.19.1 |
|
|