|
--- |
|
license: gemma |
|
base_model: google/gemma-2-2b |
|
tags: |
|
- trl |
|
- sft |
|
- generated_from_trainer |
|
model-index: |
|
- name: collapse_gemma-2-2b_hs2_accumulate_iter7_sftsd1 |
|
results: [] |
|
--- |
|
|
|
<!-- This model card has been generated automatically according to the information the Trainer had access to. You |
|
should probably proofread and complete it, then remove this comment. --> |
|
|
|
# collapse_gemma-2-2b_hs2_accumulate_iter7_sftsd1 |
|
|
|
This model is a fine-tuned version of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) on an unknown dataset. |
|
It achieves the following results on the evaluation set: |
|
- Loss: 1.0934 |
|
- Num Input Tokens Seen: 35818760 |
|
|
|
## Model description |
|
|
|
More information needed |
|
|
|
## Intended uses & limitations |
|
|
|
More information needed |
|
|
|
## Training and evaluation data |
|
|
|
More information needed |
|
|
|
## Training procedure |
|
|
|
### Training hyperparameters |
|
|
|
The following hyperparameters were used during training: |
|
- learning_rate: 8e-06 |
|
- train_batch_size: 8 |
|
- eval_batch_size: 16 |
|
- seed: 1 |
|
- gradient_accumulation_steps: 16 |
|
- total_train_batch_size: 128 |
|
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 |
|
- lr_scheduler_type: constant_with_warmup |
|
- lr_scheduler_warmup_ratio: 0.05 |
|
- num_epochs: 1 |
|
|
|
### Training results |
|
|
|
| Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen | |
|
|:-------------:|:------:|:----:|:---------------:|:-----------------:| |
|
| No log | 0 | 0 | 1.3909 | 0 | |
|
| 1.6217 | 0.0075 | 5 | 1.3867 | 271000 | |
|
| 1.4655 | 0.0151 | 10 | 1.3414 | 546632 | |
|
| 1.4425 | 0.0226 | 15 | 1.2752 | 819672 | |
|
| 1.3352 | 0.0301 | 20 | 1.2179 | 1087416 | |
|
| 1.1854 | 0.0377 | 25 | 1.1801 | 1356912 | |
|
| 1.0295 | 0.0452 | 30 | 1.1849 | 1633120 | |
|
| 0.9569 | 0.0527 | 35 | 1.1962 | 1902312 | |
|
| 0.7022 | 0.0603 | 40 | 1.2303 | 2168672 | |
|
| 0.7055 | 0.0678 | 45 | 1.2339 | 2435544 | |
|
| 0.6248 | 0.0753 | 50 | 1.2358 | 2703648 | |
|
| 0.5441 | 0.0829 | 55 | 1.2145 | 2967560 | |
|
| 0.5434 | 0.0904 | 60 | 1.2004 | 3236944 | |
|
| 0.4472 | 0.0979 | 65 | 1.1988 | 3506976 | |
|
| 0.4555 | 0.1055 | 70 | 1.1838 | 3785080 | |
|
| 0.4008 | 0.1130 | 75 | 1.1891 | 4055320 | |
|
| 0.3689 | 0.1205 | 80 | 1.1814 | 4326912 | |
|
| 0.3985 | 0.1280 | 85 | 1.1675 | 4595872 | |
|
| 0.2766 | 0.1356 | 90 | 1.1743 | 4861152 | |
|
| 0.3589 | 0.1431 | 95 | 1.1632 | 5135264 | |
|
| 0.4281 | 0.1506 | 100 | 1.1654 | 5413792 | |
|
| 0.2638 | 0.1582 | 105 | 1.1621 | 5686704 | |
|
| 0.3134 | 0.1657 | 110 | 1.1585 | 5956968 | |
|
| 0.4167 | 0.1732 | 115 | 1.1541 | 6224872 | |
|
| 0.2923 | 0.1808 | 120 | 1.1566 | 6493312 | |
|
| 0.4076 | 0.1883 | 125 | 1.1523 | 6775120 | |
|
| 0.3545 | 0.1958 | 130 | 1.1504 | 7043896 | |
|
| 0.2846 | 0.2034 | 135 | 1.1519 | 7311696 | |
|
| 0.3653 | 0.2109 | 140 | 1.1472 | 7578920 | |
|
| 0.3325 | 0.2184 | 145 | 1.1503 | 7845576 | |
|
| 0.3284 | 0.2260 | 150 | 1.1466 | 8115408 | |
|
| 0.2892 | 0.2335 | 155 | 1.1414 | 8385200 | |
|
| 0.2424 | 0.2410 | 160 | 1.1451 | 8657328 | |
|
| 0.2332 | 0.2486 | 165 | 1.1433 | 8935176 | |
|
| 0.1998 | 0.2561 | 170 | 1.1409 | 9211448 | |
|
| 0.304 | 0.2636 | 175 | 1.1400 | 9482072 | |
|
| 0.3124 | 0.2712 | 180 | 1.1379 | 9753520 | |
|
| 0.3096 | 0.2787 | 185 | 1.1429 | 10020056 | |
|
| 0.3539 | 0.2862 | 190 | 1.1358 | 10292264 | |
|
| 0.308 | 0.2938 | 195 | 1.1379 | 10554488 | |
|
| 0.2535 | 0.3013 | 200 | 1.1357 | 10822488 | |
|
| 0.3166 | 0.3088 | 205 | 1.1328 | 11097256 | |
|
| 0.2653 | 0.3164 | 210 | 1.1327 | 11376640 | |
|
| 0.2697 | 0.3239 | 215 | 1.1351 | 11643032 | |
|
| 0.2742 | 0.3314 | 220 | 1.1293 | 11919368 | |
|
| 0.3344 | 0.3390 | 225 | 1.1314 | 12187896 | |
|
| 0.1981 | 0.3465 | 230 | 1.1284 | 12461560 | |
|
| 0.2823 | 0.3540 | 235 | 1.1275 | 12733568 | |
|
| 0.3029 | 0.3615 | 240 | 1.1289 | 12999600 | |
|
| 0.3232 | 0.3691 | 245 | 1.1257 | 13267680 | |
|
| 0.2336 | 0.3766 | 250 | 1.1287 | 13533656 | |
|
| 0.2642 | 0.3841 | 255 | 1.1263 | 13808592 | |
|
| 0.3177 | 0.3917 | 260 | 1.1228 | 14075880 | |
|
| 0.284 | 0.3992 | 265 | 1.1247 | 14343328 | |
|
| 0.3039 | 0.4067 | 270 | 1.1206 | 14612480 | |
|
| 0.2793 | 0.4143 | 275 | 1.1206 | 14882944 | |
|
| 0.3073 | 0.4218 | 280 | 1.1250 | 15154088 | |
|
| 0.3092 | 0.4293 | 285 | 1.1196 | 15420928 | |
|
| 0.2349 | 0.4369 | 290 | 1.1192 | 15691528 | |
|
| 0.1937 | 0.4444 | 295 | 1.1194 | 15966376 | |
|
| 0.3677 | 0.4519 | 300 | 1.1175 | 16235816 | |
|
| 0.1964 | 0.4595 | 305 | 1.1174 | 16503712 | |
|
| 0.3342 | 0.4670 | 310 | 1.1173 | 16780344 | |
|
| 0.2434 | 0.4745 | 315 | 1.1193 | 17047624 | |
|
| 0.3076 | 0.4821 | 320 | 1.1144 | 17315800 | |
|
| 0.2931 | 0.4896 | 325 | 1.1149 | 17589048 | |
|
| 0.2965 | 0.4971 | 330 | 1.1140 | 17850624 | |
|
| 0.3294 | 0.5047 | 335 | 1.1122 | 18123168 | |
|
| 0.3072 | 0.5122 | 340 | 1.1134 | 18404496 | |
|
| 0.1833 | 0.5197 | 345 | 1.1117 | 18672712 | |
|
| 0.2871 | 0.5273 | 350 | 1.1118 | 18942920 | |
|
| 0.2124 | 0.5348 | 355 | 1.1119 | 19214880 | |
|
| 0.3152 | 0.5423 | 360 | 1.1098 | 19486872 | |
|
| 0.2688 | 0.5499 | 365 | 1.1115 | 19750920 | |
|
| 0.2113 | 0.5574 | 370 | 1.1113 | 20021312 | |
|
| 0.2936 | 0.5649 | 375 | 1.1104 | 20291192 | |
|
| 0.1659 | 0.5725 | 380 | 1.1079 | 20554376 | |
|
| 0.2615 | 0.5800 | 385 | 1.1091 | 20820304 | |
|
| 0.1893 | 0.5875 | 390 | 1.1092 | 21088216 | |
|
| 0.2997 | 0.5950 | 395 | 1.1076 | 21356104 | |
|
| 0.2985 | 0.6026 | 400 | 1.1055 | 21624024 | |
|
| 0.2521 | 0.6101 | 405 | 1.1069 | 21901144 | |
|
| 0.2243 | 0.6176 | 410 | 1.1078 | 22177408 | |
|
| 0.2994 | 0.6252 | 415 | 1.1041 | 22446056 | |
|
| 0.1927 | 0.6327 | 420 | 1.1061 | 22712816 | |
|
| 0.204 | 0.6402 | 425 | 1.1064 | 22989840 | |
|
| 0.2584 | 0.6478 | 430 | 1.1028 | 23260064 | |
|
| 0.2422 | 0.6553 | 435 | 1.1029 | 23530560 | |
|
| 0.2784 | 0.6628 | 440 | 1.1048 | 23803448 | |
|
| 0.2613 | 0.6704 | 445 | 1.1038 | 24068080 | |
|
| 0.227 | 0.6779 | 450 | 1.1019 | 24333176 | |
|
| 0.2461 | 0.6854 | 455 | 1.1031 | 24603392 | |
|
| 0.1918 | 0.6930 | 460 | 1.1035 | 24876384 | |
|
| 0.2125 | 0.7005 | 465 | 1.1012 | 25140928 | |
|
| 0.2905 | 0.7080 | 470 | 1.1015 | 25405968 | |
|
| 0.1957 | 0.7156 | 475 | 1.1019 | 25677032 | |
|
| 0.1903 | 0.7231 | 480 | 1.1001 | 25949848 | |
|
| 0.2938 | 0.7306 | 485 | 1.1011 | 26219712 | |
|
| 0.2621 | 0.7382 | 490 | 1.1027 | 26491816 | |
|
| 0.2448 | 0.7457 | 495 | 1.1013 | 26760152 | |
|
| 0.2177 | 0.7532 | 500 | 1.1003 | 27026592 | |
|
| 0.3036 | 0.7608 | 505 | 1.1006 | 27298440 | |
|
| 0.2885 | 0.7683 | 510 | 1.0999 | 27571464 | |
|
| 0.3118 | 0.7758 | 515 | 1.0983 | 27843400 | |
|
| 0.2362 | 0.7834 | 520 | 1.0990 | 28113024 | |
|
| 0.2036 | 0.7909 | 525 | 1.0983 | 28381952 | |
|
| 0.3301 | 0.7984 | 530 | 1.0979 | 28654648 | |
|
| 0.3089 | 0.8060 | 535 | 1.0977 | 28927576 | |
|
| 0.2125 | 0.8135 | 540 | 1.0983 | 29196512 | |
|
| 0.1817 | 0.8210 | 545 | 1.0985 | 29471184 | |
|
| 0.3252 | 0.8285 | 550 | 1.0975 | 29742216 | |
|
| 0.2176 | 0.8361 | 555 | 1.0970 | 30010528 | |
|
| 0.2441 | 0.8436 | 560 | 1.0972 | 30278888 | |
|
| 0.2678 | 0.8511 | 565 | 1.0980 | 30549480 | |
|
| 0.2069 | 0.8587 | 570 | 1.0959 | 30816968 | |
|
| 0.2432 | 0.8662 | 575 | 1.0961 | 31089360 | |
|
| 0.1981 | 0.8737 | 580 | 1.0974 | 31354488 | |
|
| 0.2415 | 0.8813 | 585 | 1.0952 | 31624248 | |
|
| 0.2379 | 0.8888 | 590 | 1.0944 | 31891576 | |
|
| 0.2349 | 0.8963 | 595 | 1.0963 | 32153000 | |
|
| 0.1643 | 0.9039 | 600 | 1.0952 | 32419552 | |
|
| 0.2094 | 0.9114 | 605 | 1.0951 | 32692032 | |
|
| 0.2806 | 0.9189 | 610 | 1.0931 | 32959216 | |
|
| 0.2184 | 0.9265 | 615 | 1.0937 | 33229304 | |
|
| 0.2943 | 0.9340 | 620 | 1.0938 | 33500168 | |
|
| 0.2098 | 0.9415 | 625 | 1.0940 | 33767344 | |
|
| 0.214 | 0.9491 | 630 | 1.0939 | 34035680 | |
|
| 0.3333 | 0.9566 | 635 | 1.0934 | 34304400 | |
|
| 0.3684 | 0.9641 | 640 | 1.0933 | 34573040 | |
|
| 0.204 | 0.9717 | 645 | 1.0951 | 34840664 | |
|
| 0.2766 | 0.9792 | 650 | 1.0946 | 35106576 | |
|
| 0.233 | 0.9867 | 655 | 1.0934 | 35378576 | |
|
| 0.2654 | 0.9943 | 660 | 1.0939 | 35656264 | |
|
|
|
|
|
### Framework versions |
|
|
|
- Transformers 4.44.0 |
|
- Pytorch 2.4.0+cu121 |
|
- Datasets 2.20.0 |
|
- Tokenizers 0.19.1 |
|
|