metadata
license: gemma
base_model: google/gemma-2-2b
tags:
- trl
- sft
- generated_from_trainer
model-index:
- name: collapse_gemma-2-2b_hs2_accumulate_iter7_sftsd1
results: []
collapse_gemma-2-2b_hs2_accumulate_iter7_sftsd1
This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.0934
- Num Input Tokens Seen: 35818760
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 1
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.3909 | 0 |
1.6217 | 0.0075 | 5 | 1.3867 | 271000 |
1.4655 | 0.0151 | 10 | 1.3414 | 546632 |
1.4425 | 0.0226 | 15 | 1.2752 | 819672 |
1.3352 | 0.0301 | 20 | 1.2179 | 1087416 |
1.1854 | 0.0377 | 25 | 1.1801 | 1356912 |
1.0295 | 0.0452 | 30 | 1.1849 | 1633120 |
0.9569 | 0.0527 | 35 | 1.1962 | 1902312 |
0.7022 | 0.0603 | 40 | 1.2303 | 2168672 |
0.7055 | 0.0678 | 45 | 1.2339 | 2435544 |
0.6248 | 0.0753 | 50 | 1.2358 | 2703648 |
0.5441 | 0.0829 | 55 | 1.2145 | 2967560 |
0.5434 | 0.0904 | 60 | 1.2004 | 3236944 |
0.4472 | 0.0979 | 65 | 1.1988 | 3506976 |
0.4555 | 0.1055 | 70 | 1.1838 | 3785080 |
0.4008 | 0.1130 | 75 | 1.1891 | 4055320 |
0.3689 | 0.1205 | 80 | 1.1814 | 4326912 |
0.3985 | 0.1280 | 85 | 1.1675 | 4595872 |
0.2766 | 0.1356 | 90 | 1.1743 | 4861152 |
0.3589 | 0.1431 | 95 | 1.1632 | 5135264 |
0.4281 | 0.1506 | 100 | 1.1654 | 5413792 |
0.2638 | 0.1582 | 105 | 1.1621 | 5686704 |
0.3134 | 0.1657 | 110 | 1.1585 | 5956968 |
0.4167 | 0.1732 | 115 | 1.1541 | 6224872 |
0.2923 | 0.1808 | 120 | 1.1566 | 6493312 |
0.4076 | 0.1883 | 125 | 1.1523 | 6775120 |
0.3545 | 0.1958 | 130 | 1.1504 | 7043896 |
0.2846 | 0.2034 | 135 | 1.1519 | 7311696 |
0.3653 | 0.2109 | 140 | 1.1472 | 7578920 |
0.3325 | 0.2184 | 145 | 1.1503 | 7845576 |
0.3284 | 0.2260 | 150 | 1.1466 | 8115408 |
0.2892 | 0.2335 | 155 | 1.1414 | 8385200 |
0.2424 | 0.2410 | 160 | 1.1451 | 8657328 |
0.2332 | 0.2486 | 165 | 1.1433 | 8935176 |
0.1998 | 0.2561 | 170 | 1.1409 | 9211448 |
0.304 | 0.2636 | 175 | 1.1400 | 9482072 |
0.3124 | 0.2712 | 180 | 1.1379 | 9753520 |
0.3096 | 0.2787 | 185 | 1.1429 | 10020056 |
0.3539 | 0.2862 | 190 | 1.1358 | 10292264 |
0.308 | 0.2938 | 195 | 1.1379 | 10554488 |
0.2535 | 0.3013 | 200 | 1.1357 | 10822488 |
0.3166 | 0.3088 | 205 | 1.1328 | 11097256 |
0.2653 | 0.3164 | 210 | 1.1327 | 11376640 |
0.2697 | 0.3239 | 215 | 1.1351 | 11643032 |
0.2742 | 0.3314 | 220 | 1.1293 | 11919368 |
0.3344 | 0.3390 | 225 | 1.1314 | 12187896 |
0.1981 | 0.3465 | 230 | 1.1284 | 12461560 |
0.2823 | 0.3540 | 235 | 1.1275 | 12733568 |
0.3029 | 0.3615 | 240 | 1.1289 | 12999600 |
0.3232 | 0.3691 | 245 | 1.1257 | 13267680 |
0.2336 | 0.3766 | 250 | 1.1287 | 13533656 |
0.2642 | 0.3841 | 255 | 1.1263 | 13808592 |
0.3177 | 0.3917 | 260 | 1.1228 | 14075880 |
0.284 | 0.3992 | 265 | 1.1247 | 14343328 |
0.3039 | 0.4067 | 270 | 1.1206 | 14612480 |
0.2793 | 0.4143 | 275 | 1.1206 | 14882944 |
0.3073 | 0.4218 | 280 | 1.1250 | 15154088 |
0.3092 | 0.4293 | 285 | 1.1196 | 15420928 |
0.2349 | 0.4369 | 290 | 1.1192 | 15691528 |
0.1937 | 0.4444 | 295 | 1.1194 | 15966376 |
0.3677 | 0.4519 | 300 | 1.1175 | 16235816 |
0.1964 | 0.4595 | 305 | 1.1174 | 16503712 |
0.3342 | 0.4670 | 310 | 1.1173 | 16780344 |
0.2434 | 0.4745 | 315 | 1.1193 | 17047624 |
0.3076 | 0.4821 | 320 | 1.1144 | 17315800 |
0.2931 | 0.4896 | 325 | 1.1149 | 17589048 |
0.2965 | 0.4971 | 330 | 1.1140 | 17850624 |
0.3294 | 0.5047 | 335 | 1.1122 | 18123168 |
0.3072 | 0.5122 | 340 | 1.1134 | 18404496 |
0.1833 | 0.5197 | 345 | 1.1117 | 18672712 |
0.2871 | 0.5273 | 350 | 1.1118 | 18942920 |
0.2124 | 0.5348 | 355 | 1.1119 | 19214880 |
0.3152 | 0.5423 | 360 | 1.1098 | 19486872 |
0.2688 | 0.5499 | 365 | 1.1115 | 19750920 |
0.2113 | 0.5574 | 370 | 1.1113 | 20021312 |
0.2936 | 0.5649 | 375 | 1.1104 | 20291192 |
0.1659 | 0.5725 | 380 | 1.1079 | 20554376 |
0.2615 | 0.5800 | 385 | 1.1091 | 20820304 |
0.1893 | 0.5875 | 390 | 1.1092 | 21088216 |
0.2997 | 0.5950 | 395 | 1.1076 | 21356104 |
0.2985 | 0.6026 | 400 | 1.1055 | 21624024 |
0.2521 | 0.6101 | 405 | 1.1069 | 21901144 |
0.2243 | 0.6176 | 410 | 1.1078 | 22177408 |
0.2994 | 0.6252 | 415 | 1.1041 | 22446056 |
0.1927 | 0.6327 | 420 | 1.1061 | 22712816 |
0.204 | 0.6402 | 425 | 1.1064 | 22989840 |
0.2584 | 0.6478 | 430 | 1.1028 | 23260064 |
0.2422 | 0.6553 | 435 | 1.1029 | 23530560 |
0.2784 | 0.6628 | 440 | 1.1048 | 23803448 |
0.2613 | 0.6704 | 445 | 1.1038 | 24068080 |
0.227 | 0.6779 | 450 | 1.1019 | 24333176 |
0.2461 | 0.6854 | 455 | 1.1031 | 24603392 |
0.1918 | 0.6930 | 460 | 1.1035 | 24876384 |
0.2125 | 0.7005 | 465 | 1.1012 | 25140928 |
0.2905 | 0.7080 | 470 | 1.1015 | 25405968 |
0.1957 | 0.7156 | 475 | 1.1019 | 25677032 |
0.1903 | 0.7231 | 480 | 1.1001 | 25949848 |
0.2938 | 0.7306 | 485 | 1.1011 | 26219712 |
0.2621 | 0.7382 | 490 | 1.1027 | 26491816 |
0.2448 | 0.7457 | 495 | 1.1013 | 26760152 |
0.2177 | 0.7532 | 500 | 1.1003 | 27026592 |
0.3036 | 0.7608 | 505 | 1.1006 | 27298440 |
0.2885 | 0.7683 | 510 | 1.0999 | 27571464 |
0.3118 | 0.7758 | 515 | 1.0983 | 27843400 |
0.2362 | 0.7834 | 520 | 1.0990 | 28113024 |
0.2036 | 0.7909 | 525 | 1.0983 | 28381952 |
0.3301 | 0.7984 | 530 | 1.0979 | 28654648 |
0.3089 | 0.8060 | 535 | 1.0977 | 28927576 |
0.2125 | 0.8135 | 540 | 1.0983 | 29196512 |
0.1817 | 0.8210 | 545 | 1.0985 | 29471184 |
0.3252 | 0.8285 | 550 | 1.0975 | 29742216 |
0.2176 | 0.8361 | 555 | 1.0970 | 30010528 |
0.2441 | 0.8436 | 560 | 1.0972 | 30278888 |
0.2678 | 0.8511 | 565 | 1.0980 | 30549480 |
0.2069 | 0.8587 | 570 | 1.0959 | 30816968 |
0.2432 | 0.8662 | 575 | 1.0961 | 31089360 |
0.1981 | 0.8737 | 580 | 1.0974 | 31354488 |
0.2415 | 0.8813 | 585 | 1.0952 | 31624248 |
0.2379 | 0.8888 | 590 | 1.0944 | 31891576 |
0.2349 | 0.8963 | 595 | 1.0963 | 32153000 |
0.1643 | 0.9039 | 600 | 1.0952 | 32419552 |
0.2094 | 0.9114 | 605 | 1.0951 | 32692032 |
0.2806 | 0.9189 | 610 | 1.0931 | 32959216 |
0.2184 | 0.9265 | 615 | 1.0937 | 33229304 |
0.2943 | 0.9340 | 620 | 1.0938 | 33500168 |
0.2098 | 0.9415 | 625 | 1.0940 | 33767344 |
0.214 | 0.9491 | 630 | 1.0939 | 34035680 |
0.3333 | 0.9566 | 635 | 1.0934 | 34304400 |
0.3684 | 0.9641 | 640 | 1.0933 | 34573040 |
0.204 | 0.9717 | 645 | 1.0951 | 34840664 |
0.2766 | 0.9792 | 650 | 1.0946 | 35106576 |
0.233 | 0.9867 | 655 | 1.0934 | 35378576 |
0.2654 | 0.9943 | 660 | 1.0939 | 35656264 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1