|
--- |
|
license: gemma |
|
base_model: google/gemma-2-2b |
|
tags: |
|
- trl |
|
- sft |
|
- generated_from_trainer |
|
model-index: |
|
- name: collapse_gemma-2-2b_hs2_accumulate_iter8_sftsd2 |
|
results: [] |
|
--- |
|
|
|
<!-- This model card has been generated automatically according to the information the Trainer had access to. You |
|
should probably proofread and complete it, then remove this comment. --> |
|
|
|
# collapse_gemma-2-2b_hs2_accumulate_iter8_sftsd2 |
|
|
|
This model is a fine-tuned version of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) on an unknown dataset. |
|
It achieves the following results on the evaluation set: |
|
- Loss: 1.1029 |
|
- Num Input Tokens Seen: 41091672 |
|
|
|
## Model description |
|
|
|
More information needed |
|
|
|
## Intended uses & limitations |
|
|
|
More information needed |
|
|
|
## Training and evaluation data |
|
|
|
More information needed |
|
|
|
## Training procedure |
|
|
|
### Training hyperparameters |
|
|
|
The following hyperparameters were used during training: |
|
- learning_rate: 8e-06 |
|
- train_batch_size: 8 |
|
- eval_batch_size: 16 |
|
- seed: 2 |
|
- gradient_accumulation_steps: 16 |
|
- total_train_batch_size: 128 |
|
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 |
|
- lr_scheduler_type: constant_with_warmup |
|
- lr_scheduler_warmup_ratio: 0.05 |
|
- num_epochs: 1 |
|
|
|
### Training results |
|
|
|
| Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen | |
|
|:-------------:|:------:|:----:|:---------------:|:-----------------:| |
|
| No log | 0 | 0 | 1.3909 | 0 | |
|
| 1.5665 | 0.0066 | 5 | 1.3873 | 272560 | |
|
| 1.5456 | 0.0132 | 10 | 1.3529 | 547080 | |
|
| 1.4344 | 0.0198 | 15 | 1.2836 | 822880 | |
|
| 1.4512 | 0.0264 | 20 | 1.2345 | 1089864 | |
|
| 1.3462 | 0.0330 | 25 | 1.1901 | 1361576 | |
|
| 1.1712 | 0.0396 | 30 | 1.1835 | 1634504 | |
|
| 1.0826 | 0.0462 | 35 | 1.1964 | 1895936 | |
|
| 0.9291 | 0.0527 | 40 | 1.1914 | 2166120 | |
|
| 0.8296 | 0.0593 | 45 | 1.2208 | 2435904 | |
|
| 0.6654 | 0.0659 | 50 | 1.2499 | 2706240 | |
|
| 0.6401 | 0.0725 | 55 | 1.2356 | 2984976 | |
|
| 0.6449 | 0.0791 | 60 | 1.2089 | 3257728 | |
|
| 0.5585 | 0.0857 | 65 | 1.2026 | 3526976 | |
|
| 0.468 | 0.0923 | 70 | 1.2120 | 3804888 | |
|
| 0.5271 | 0.0989 | 75 | 1.2040 | 4078544 | |
|
| 0.3901 | 0.1055 | 80 | 1.1976 | 4356048 | |
|
| 0.4389 | 0.1121 | 85 | 1.2049 | 4621624 | |
|
| 0.3482 | 0.1187 | 90 | 1.1972 | 4888632 | |
|
| 0.3224 | 0.1253 | 95 | 1.1926 | 5152168 | |
|
| 0.4305 | 0.1319 | 100 | 1.1944 | 5423968 | |
|
| 0.3758 | 0.1385 | 105 | 1.1825 | 5697240 | |
|
| 0.3646 | 0.1450 | 110 | 1.1919 | 5971384 | |
|
| 0.3215 | 0.1516 | 115 | 1.1776 | 6240360 | |
|
| 0.3273 | 0.1582 | 120 | 1.1907 | 6509288 | |
|
| 0.3152 | 0.1648 | 125 | 1.1786 | 6779048 | |
|
| 0.2365 | 0.1714 | 130 | 1.1833 | 7048200 | |
|
| 0.3342 | 0.1780 | 135 | 1.1750 | 7316656 | |
|
| 0.3586 | 0.1846 | 140 | 1.1774 | 7590728 | |
|
| 0.2927 | 0.1912 | 145 | 1.1737 | 7859680 | |
|
| 0.3788 | 0.1978 | 150 | 1.1760 | 8126224 | |
|
| 0.2964 | 0.2044 | 155 | 1.1741 | 8403808 | |
|
| 0.2938 | 0.2110 | 160 | 1.1677 | 8672216 | |
|
| 0.2518 | 0.2176 | 165 | 1.1735 | 8946264 | |
|
| 0.3334 | 0.2242 | 170 | 1.1647 | 9208352 | |
|
| 0.311 | 0.2308 | 175 | 1.1647 | 9477208 | |
|
| 0.3065 | 0.2373 | 180 | 1.1620 | 9748024 | |
|
| 0.2517 | 0.2439 | 185 | 1.1613 | 10021768 | |
|
| 0.2672 | 0.2505 | 190 | 1.1569 | 10293208 | |
|
| 0.2611 | 0.2571 | 195 | 1.1545 | 10569280 | |
|
| 0.2265 | 0.2637 | 200 | 1.1548 | 10840984 | |
|
| 0.3068 | 0.2703 | 205 | 1.1520 | 11116568 | |
|
| 0.2929 | 0.2769 | 210 | 1.1568 | 11394928 | |
|
| 0.3351 | 0.2835 | 215 | 1.1547 | 11666600 | |
|
| 0.2687 | 0.2901 | 220 | 1.1544 | 11946656 | |
|
| 0.2501 | 0.2967 | 225 | 1.1479 | 12224240 | |
|
| 0.1991 | 0.3033 | 230 | 1.1520 | 12500672 | |
|
| 0.2434 | 0.3099 | 235 | 1.1477 | 12767840 | |
|
| 0.1667 | 0.3165 | 240 | 1.1453 | 13035688 | |
|
| 0.2564 | 0.3231 | 245 | 1.1509 | 13312232 | |
|
| 0.2856 | 0.3297 | 250 | 1.1436 | 13584328 | |
|
| 0.305 | 0.3362 | 255 | 1.1425 | 13853288 | |
|
| 0.2765 | 0.3428 | 260 | 1.1456 | 14113512 | |
|
| 0.2209 | 0.3494 | 265 | 1.1455 | 14385280 | |
|
| 0.2125 | 0.3560 | 270 | 1.1410 | 14660096 | |
|
| 0.274 | 0.3626 | 275 | 1.1417 | 14931976 | |
|
| 0.2181 | 0.3692 | 280 | 1.1411 | 15202008 | |
|
| 0.2481 | 0.3758 | 285 | 1.1374 | 15468896 | |
|
| 0.2629 | 0.3824 | 290 | 1.1372 | 15733744 | |
|
| 0.2826 | 0.3890 | 295 | 1.1366 | 16004424 | |
|
| 0.2646 | 0.3956 | 300 | 1.1363 | 16276088 | |
|
| 0.2729 | 0.4022 | 305 | 1.1333 | 16547304 | |
|
| 0.2735 | 0.4088 | 310 | 1.1350 | 16819224 | |
|
| 0.2881 | 0.4154 | 315 | 1.1349 | 17088704 | |
|
| 0.2208 | 0.4220 | 320 | 1.1304 | 17362560 | |
|
| 0.1822 | 0.4285 | 325 | 1.1348 | 17632840 | |
|
| 0.3197 | 0.4351 | 330 | 1.1306 | 17903232 | |
|
| 0.1763 | 0.4417 | 335 | 1.1287 | 18171208 | |
|
| 0.2851 | 0.4483 | 340 | 1.1333 | 18444312 | |
|
| 0.2406 | 0.4549 | 345 | 1.1318 | 18716768 | |
|
| 0.2571 | 0.4615 | 350 | 1.1291 | 18983016 | |
|
| 0.3931 | 0.4681 | 355 | 1.1282 | 19256840 | |
|
| 0.1952 | 0.4747 | 360 | 1.1287 | 19527776 | |
|
| 0.227 | 0.4813 | 365 | 1.1282 | 19800232 | |
|
| 0.2979 | 0.4879 | 370 | 1.1285 | 20074720 | |
|
| 0.1515 | 0.4945 | 375 | 1.1280 | 20350824 | |
|
| 0.336 | 0.5011 | 380 | 1.1254 | 20627392 | |
|
| 0.2381 | 0.5077 | 385 | 1.1258 | 20900344 | |
|
| 0.2331 | 0.5143 | 390 | 1.1253 | 21173120 | |
|
| 0.2176 | 0.5209 | 395 | 1.1250 | 21442720 | |
|
| 0.232 | 0.5274 | 400 | 1.1268 | 21711376 | |
|
| 0.2648 | 0.5340 | 405 | 1.1246 | 21977752 | |
|
| 0.2398 | 0.5406 | 410 | 1.1241 | 22247224 | |
|
| 0.2246 | 0.5472 | 415 | 1.1245 | 22525976 | |
|
| 0.2836 | 0.5538 | 420 | 1.1199 | 22795472 | |
|
| 0.242 | 0.5604 | 425 | 1.1233 | 23063720 | |
|
| 0.2369 | 0.5670 | 430 | 1.1230 | 23333144 | |
|
| 0.2856 | 0.5736 | 435 | 1.1206 | 23599032 | |
|
| 0.2595 | 0.5802 | 440 | 1.1208 | 23871616 | |
|
| 0.2154 | 0.5868 | 445 | 1.1188 | 24144160 | |
|
| 0.2541 | 0.5934 | 450 | 1.1208 | 24412552 | |
|
| 0.2378 | 0.6000 | 455 | 1.1210 | 24683400 | |
|
| 0.233 | 0.6066 | 460 | 1.1183 | 24956656 | |
|
| 0.3136 | 0.6132 | 465 | 1.1211 | 25235888 | |
|
| 0.2549 | 0.6197 | 470 | 1.1185 | 25505944 | |
|
| 0.259 | 0.6263 | 475 | 1.1179 | 25776080 | |
|
| 0.1539 | 0.6329 | 480 | 1.1197 | 26043984 | |
|
| 0.2459 | 0.6395 | 485 | 1.1183 | 26318896 | |
|
| 0.2342 | 0.6461 | 490 | 1.1182 | 26585616 | |
|
| 0.2173 | 0.6527 | 495 | 1.1172 | 26862168 | |
|
| 0.3048 | 0.6593 | 500 | 1.1172 | 27130760 | |
|
| 0.2851 | 0.6659 | 505 | 1.1142 | 27397928 | |
|
| 0.2091 | 0.6725 | 510 | 1.1148 | 27670712 | |
|
| 0.3143 | 0.6791 | 515 | 1.1149 | 27933056 | |
|
| 0.1672 | 0.6857 | 520 | 1.1152 | 28201952 | |
|
| 0.3181 | 0.6923 | 525 | 1.1164 | 28477464 | |
|
| 0.1914 | 0.6989 | 530 | 1.1174 | 28743664 | |
|
| 0.2931 | 0.7055 | 535 | 1.1155 | 29016592 | |
|
| 0.2285 | 0.7120 | 540 | 1.1133 | 29283872 | |
|
| 0.2749 | 0.7186 | 545 | 1.1163 | 29554240 | |
|
| 0.2901 | 0.7252 | 550 | 1.1145 | 29821128 | |
|
| 0.2361 | 0.7318 | 555 | 1.1114 | 30095352 | |
|
| 0.2654 | 0.7384 | 560 | 1.1125 | 30371160 | |
|
| 0.1935 | 0.7450 | 565 | 1.1129 | 30645928 | |
|
| 0.268 | 0.7516 | 570 | 1.1101 | 30919376 | |
|
| 0.1795 | 0.7582 | 575 | 1.1139 | 31186848 | |
|
| 0.2439 | 0.7648 | 580 | 1.1122 | 31459480 | |
|
| 0.259 | 0.7714 | 585 | 1.1091 | 31733560 | |
|
| 0.248 | 0.7780 | 590 | 1.1105 | 32003016 | |
|
| 0.2186 | 0.7846 | 595 | 1.1106 | 32278448 | |
|
| 0.1595 | 0.7912 | 600 | 1.1115 | 32538192 | |
|
| 0.2058 | 0.7978 | 605 | 1.1117 | 32816064 | |
|
| 0.2324 | 0.8044 | 610 | 1.1095 | 33087144 | |
|
| 0.2045 | 0.8109 | 615 | 1.1094 | 33353000 | |
|
| 0.2333 | 0.8175 | 620 | 1.1095 | 33621888 | |
|
| 0.2159 | 0.8241 | 625 | 1.1076 | 33888104 | |
|
| 0.2866 | 0.8307 | 630 | 1.1094 | 34159240 | |
|
| 0.2268 | 0.8373 | 635 | 1.1101 | 34430064 | |
|
| 0.1753 | 0.8439 | 640 | 1.1100 | 34700128 | |
|
| 0.2076 | 0.8505 | 645 | 1.1089 | 34968768 | |
|
| 0.1912 | 0.8571 | 650 | 1.1069 | 35250136 | |
|
| 0.1534 | 0.8637 | 655 | 1.1074 | 35524024 | |
|
| 0.1424 | 0.8703 | 660 | 1.1083 | 35789520 | |
|
| 0.2325 | 0.8769 | 665 | 1.1076 | 36067376 | |
|
| 0.2607 | 0.8835 | 670 | 1.1046 | 36340512 | |
|
| 0.234 | 0.8901 | 675 | 1.1048 | 36603160 | |
|
| 0.232 | 0.8967 | 680 | 1.1081 | 36872480 | |
|
| 0.2998 | 0.9032 | 685 | 1.1080 | 37146736 | |
|
| 0.1921 | 0.9098 | 690 | 1.1045 | 37414776 | |
|
| 0.2492 | 0.9164 | 695 | 1.1060 | 37685600 | |
|
| 0.27 | 0.9230 | 700 | 1.1068 | 37949648 | |
|
| 0.2159 | 0.9296 | 705 | 1.1046 | 38226312 | |
|
| 0.1912 | 0.9362 | 710 | 1.1062 | 38502072 | |
|
| 0.23 | 0.9428 | 715 | 1.1076 | 38772744 | |
|
| 0.3387 | 0.9494 | 720 | 1.1054 | 39041632 | |
|
| 0.23 | 0.9560 | 725 | 1.1051 | 39313560 | |
|
| 0.2785 | 0.9626 | 730 | 1.1065 | 39585992 | |
|
| 0.2116 | 0.9692 | 735 | 1.1030 | 39856632 | |
|
| 0.2378 | 0.9758 | 740 | 1.1040 | 40120176 | |
|
| 0.2006 | 0.9824 | 745 | 1.1046 | 40392064 | |
|
| 0.2418 | 0.9890 | 750 | 1.1024 | 40664776 | |
|
| 0.2041 | 0.9955 | 755 | 1.1028 | 40931592 | |
|
|
|
|
|
### Framework versions |
|
|
|
- Transformers 4.44.0 |
|
- Pytorch 2.4.0+cu121 |
|
- Datasets 2.20.0 |
|
- Tokenizers 0.19.1 |
|
|