collapse_gemma-2-2b_hs2_accumulate_iter7_sftsd0

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.0939
  • Num Input Tokens Seen: 36687080

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 0
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.5743 0.0076 5 1.3850 286024
1.5698 0.0152 10 1.3359 565176
1.5023 0.0227 15 1.2721 843224
1.3784 0.0303 20 1.2210 1128808
1.1853 0.0379 25 1.1834 1409632
1.079 0.0455 30 1.1911 1688000
0.9274 0.0531 35 1.2022 1961576
0.8275 0.0607 40 1.2078 2242896
0.6817 0.0682 45 1.2485 2524032
0.5892 0.0758 50 1.2344 2801792
0.4418 0.0834 55 1.2415 3078040
0.4992 0.0910 60 1.1980 3358368
0.4529 0.0986 65 1.2040 3643320
0.4315 0.1062 70 1.2063 3920184
0.3633 0.1137 75 1.1887 4195744
0.3498 0.1213 80 1.1900 4474088
0.5205 0.1289 85 1.1810 4750552
0.4456 0.1365 90 1.1784 5033120
0.2259 0.1441 95 1.1689 5308224
0.2957 0.1517 100 1.1673 5584192
0.2861 0.1592 105 1.1622 5855384
0.396 0.1668 110 1.1576 6135472
0.2727 0.1744 115 1.1593 6417808
0.2863 0.1820 120 1.1536 6694768
0.3506 0.1896 125 1.1512 6974920
0.3593 0.1972 130 1.1506 7250952
0.3129 0.2047 135 1.1464 7528424
0.305 0.2123 140 1.1471 7796288
0.2969 0.2199 145 1.1458 8071736
0.3828 0.2275 150 1.1450 8354136
0.2908 0.2351 155 1.1426 8627856
0.3691 0.2427 160 1.1403 8906272
0.248 0.2502 165 1.1434 9190272
0.2853 0.2578 170 1.1398 9467688
0.336 0.2654 175 1.1423 9745264
0.2295 0.2730 180 1.1392 10022808
0.2522 0.2806 185 1.1382 10307056
0.2513 0.2882 190 1.1442 10582992
0.2799 0.2957 195 1.1370 10866240
0.2176 0.3033 200 1.1359 11148368
0.293 0.3109 205 1.1353 11433232
0.3076 0.3185 210 1.1317 11705656
0.2469 0.3261 215 1.1337 11983632
0.3734 0.3336 220 1.1323 12266112
0.2704 0.3412 225 1.1290 12547976
0.3469 0.3488 230 1.1300 12824592
0.3266 0.3564 235 1.1280 13098760
0.2528 0.3640 240 1.1268 13368616
0.2867 0.3716 245 1.1266 13650008
0.228 0.3791 250 1.1262 13927240
0.233 0.3867 255 1.1249 14203184
0.2724 0.3943 260 1.1250 14475384
0.2117 0.4019 265 1.1245 14760384
0.1981 0.4095 270 1.1226 15040960
0.2519 0.4171 275 1.1219 15323064
0.4068 0.4246 280 1.1205 15603904
0.2811 0.4322 285 1.1214 15883608
0.259 0.4398 290 1.1201 16159520
0.2938 0.4474 295 1.1208 16437656
0.2466 0.4550 300 1.1214 16716952
0.2997 0.4626 305 1.1162 16992344
0.2268 0.4701 310 1.1229 17268760
0.343 0.4777 315 1.1172 17547648
0.2424 0.4853 320 1.1154 17828288
0.2849 0.4929 325 1.1172 18107576
0.478 0.5005 330 1.1155 18387728
0.1959 0.5081 335 1.1162 18667088
0.1868 0.5156 340 1.1160 18950480
0.234 0.5232 345 1.1150 19228760
0.2519 0.5308 350 1.1135 19508952
0.2625 0.5384 355 1.1145 19787448
0.3843 0.5460 360 1.1109 20073168
0.3005 0.5536 365 1.1109 20343008
0.1833 0.5611 370 1.1110 20623352
0.2446 0.5687 375 1.1093 20901240
0.25 0.5763 380 1.1104 21185296
0.2897 0.5839 385 1.1103 21464672
0.168 0.5915 390 1.1099 21743520
0.2387 0.5991 395 1.1106 22023544
0.2066 0.6066 400 1.1072 22291944
0.2191 0.6142 405 1.1089 22572096
0.1869 0.6218 410 1.1085 22849472
0.1939 0.6294 415 1.1075 23126440
0.2368 0.6370 420 1.1091 23406096
0.2209 0.6445 425 1.1066 23678072
0.2523 0.6521 430 1.1077 23961192
0.2416 0.6597 435 1.1082 24240520
0.1964 0.6673 440 1.1057 24520856
0.2369 0.6749 445 1.1055 24798288
0.23 0.6825 450 1.1074 25075848
0.2349 0.6900 455 1.1046 25344112
0.243 0.6976 460 1.1063 25625216
0.3343 0.7052 465 1.1066 25901904
0.2341 0.7128 470 1.1042 26177128
0.283 0.7204 475 1.1059 26459400
0.3112 0.7280 480 1.1066 26736784
0.3015 0.7355 485 1.1042 27017152
0.2788 0.7431 490 1.1031 27295048
0.1838 0.7507 495 1.1025 27575392
0.2366 0.7583 500 1.1036 27852328
0.297 0.7659 505 1.1032 28130032
0.1622 0.7735 510 1.1015 28407672
0.165 0.7810 515 1.1012 28680696
0.3047 0.7886 520 1.1010 28957216
0.336 0.7962 525 1.1012 29235048
0.2728 0.8038 530 1.1011 29507352
0.2007 0.8114 535 1.1008 29778208
0.2253 0.8190 540 1.1013 30055416
0.2386 0.8265 545 1.0982 30333728
0.2056 0.8341 550 1.0989 30599088
0.2879 0.8417 555 1.1003 30883072
0.2207 0.8493 560 1.0993 31160232
0.2821 0.8569 565 1.0979 31441272
0.2246 0.8645 570 1.0982 31712696
0.3249 0.8720 575 1.0980 31991400
0.2616 0.8796 580 1.0985 32269224
0.2716 0.8872 585 1.0997 32542384
0.2898 0.8948 590 1.0979 32826016
0.2617 0.9024 595 1.0968 33110848
0.2057 0.9100 600 1.0988 33391352
0.293 0.9175 605 1.0965 33670472
0.2081 0.9251 610 1.0947 33950936
0.2801 0.9327 615 1.0963 34226952
0.2678 0.9403 620 1.0952 34502376
0.222 0.9479 625 1.0944 34774480
0.2561 0.9555 630 1.0944 35057720
0.2738 0.9630 635 1.0947 35333096
0.182 0.9706 640 1.0947 35614552
0.224 0.9782 645 1.0935 35890992
0.2861 0.9858 650 1.0935 36177736
0.2674 0.9934 655 1.0948 36462944

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
1
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter7_sftsd0

Base model

google/gemma-2-2b
Finetuned
(471)
this model