collapse_gemma-2-2b_hs2_accumulate_iter7_sftsd2

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1045
  • Num Input Tokens Seen: 36166336

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.6049 0.0075 5 1.3862 273640
1.6224 0.0151 10 1.3404 554216
1.4024 0.0226 15 1.2742 825824
1.3776 0.0302 20 1.2246 1100896
1.2832 0.0377 25 1.1803 1379192
1.22 0.0452 30 1.1783 1656944
0.9584 0.0528 35 1.1731 1925784
0.8881 0.0603 40 1.2068 2192080
0.8391 0.0678 45 1.2100 2459864
0.7926 0.0754 50 1.2160 2736544
0.647 0.0829 55 1.2217 3005032
0.6438 0.0905 60 1.2151 3277256
0.5487 0.0980 65 1.2157 3547224
0.536 0.1055 70 1.2048 3817448
0.4943 0.1131 75 1.1964 4094432
0.5394 0.1206 80 1.1933 4367400
0.3851 0.1282 85 1.1909 4635248
0.4303 0.1357 90 1.1893 4903792
0.4199 0.1432 95 1.1818 5173464
0.3878 0.1508 100 1.1820 5446408
0.4044 0.1583 105 1.1846 5722824
0.3266 0.1658 110 1.1800 5998616
0.3367 0.1734 115 1.1756 6269328
0.2639 0.1809 120 1.1786 6542264
0.2647 0.1885 125 1.1753 6813600
0.3762 0.1960 130 1.1739 7087552
0.3209 0.2035 135 1.1699 7360376
0.3376 0.2111 140 1.1709 7632536
0.2674 0.2186 145 1.1719 7901296
0.2631 0.2262 150 1.1681 8167576
0.3092 0.2337 155 1.1664 8438360
0.3305 0.2412 160 1.1669 8709792
0.3066 0.2488 165 1.1607 8988856
0.2807 0.2563 170 1.1590 9265304
0.3085 0.2639 175 1.1574 9543928
0.2921 0.2714 180 1.1527 9817056
0.3605 0.2789 185 1.1557 10088872
0.2578 0.2865 190 1.1481 10360768
0.3511 0.2940 195 1.1570 10632016
0.3591 0.3015 200 1.1461 10907720
0.2076 0.3091 205 1.1540 11181728
0.3326 0.3166 210 1.1482 11460608
0.3914 0.3242 215 1.1478 11730288
0.304 0.3317 220 1.1487 12001208
0.3811 0.3392 225 1.1459 12272960
0.2744 0.3468 230 1.1408 12542408
0.326 0.3543 235 1.1443 12813656
0.3474 0.3619 240 1.1414 13084432
0.3346 0.3694 245 1.1430 13360240
0.2965 0.3769 250 1.1417 13639536
0.2382 0.3845 255 1.1373 13914080
0.2243 0.3920 260 1.1406 14189128
0.1954 0.3995 265 1.1370 14460672
0.2857 0.4071 270 1.1398 14727040
0.2819 0.4146 275 1.1351 15002688
0.2801 0.4222 280 1.1367 15275512
0.2907 0.4297 285 1.1351 15554848
0.2928 0.4372 290 1.1314 15828296
0.2588 0.4448 295 1.1358 16106416
0.2453 0.4523 300 1.1329 16381944
0.3333 0.4599 305 1.1309 16661632
0.1884 0.4674 310 1.1300 16934712
0.3095 0.4749 315 1.1309 17209816
0.2858 0.4825 320 1.1301 17484664
0.3195 0.4900 325 1.1264 17759488
0.3203 0.4975 330 1.1277 18034664
0.3492 0.5051 335 1.1266 18311424
0.3129 0.5126 340 1.1249 18584528
0.2546 0.5202 345 1.1277 18861208
0.2907 0.5277 350 1.1233 19135856
0.2693 0.5352 355 1.1235 19415704
0.2942 0.5428 360 1.1219 19685048
0.2393 0.5503 365 1.1222 19954816
0.2333 0.5579 370 1.1219 20226432
0.2208 0.5654 375 1.1232 20499384
0.2508 0.5729 380 1.1209 20779280
0.2002 0.5805 385 1.1235 21053584
0.3333 0.5880 390 1.1216 21325712
0.2492 0.5956 395 1.1233 21599000
0.2484 0.6031 400 1.1225 21871640
0.3439 0.6106 405 1.1191 22140448
0.3389 0.6182 410 1.1218 22409872
0.2778 0.6257 415 1.1197 22691600
0.2713 0.6332 420 1.1177 22961160
0.2169 0.6408 425 1.1194 23229808
0.2825 0.6483 430 1.1193 23493888
0.2436 0.6559 435 1.1170 23766688
0.3057 0.6634 440 1.1191 24038552
0.2639 0.6709 445 1.1159 24312808
0.322 0.6785 450 1.1162 24589072
0.1909 0.6860 455 1.1180 24855872
0.2823 0.6936 460 1.1171 25129120
0.2644 0.7011 465 1.1143 25401832
0.2379 0.7086 470 1.1151 25676584
0.2572 0.7162 475 1.1151 25946424
0.1768 0.7237 480 1.1121 26216712
0.3079 0.7312 485 1.1137 26483648
0.1986 0.7388 490 1.1112 26756200
0.2847 0.7463 495 1.1128 27024176
0.1732 0.7539 500 1.1135 27293512
0.2724 0.7614 505 1.1120 27569208
0.285 0.7689 510 1.1124 27836456
0.2303 0.7765 515 1.1100 28107632
0.2479 0.7840 520 1.1107 28377688
0.2432 0.7916 525 1.1109 28646944
0.3432 0.7991 530 1.1102 28922352
0.217 0.8066 535 1.1094 29197160
0.2464 0.8142 540 1.1099 29473128
0.3135 0.8217 545 1.1086 29746736
0.2532 0.8292 550 1.1095 30013224
0.3145 0.8368 555 1.1090 30281256
0.207 0.8443 560 1.1067 30549144
0.1811 0.8519 565 1.1080 30828416
0.3074 0.8594 570 1.1079 31104032
0.2753 0.8669 575 1.1048 31374216
0.155 0.8745 580 1.1082 31649384
0.2296 0.8820 585 1.1087 31920192
0.2206 0.8896 590 1.1057 32187320
0.2657 0.8971 595 1.1065 32463088
0.2821 0.9046 600 1.1069 32731832
0.2835 0.9122 605 1.1051 33003520
0.2168 0.9197 610 1.1063 33270088
0.2783 0.9273 615 1.1067 33542704
0.2993 0.9348 620 1.1048 33816144
0.2227 0.9423 625 1.1027 34089248
0.243 0.9499 630 1.1044 34359824
0.2575 0.9574 635 1.1044 34638264
0.1769 0.9649 640 1.1049 34910856
0.2472 0.9725 645 1.1055 35184536
0.2593 0.9800 650 1.1024 35455744
0.2254 0.9876 655 1.1048 35726536
0.1744 0.9951 660 1.1068 35999296

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
4
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter7_sftsd2

Base model

google/gemma-2-2b
Finetuned
(471)
this model