collapse_gemma-2-2b_hs2_accumulate_iter5_sftsd2

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1017
  • Num Input Tokens Seen: 25730952

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.5411 0.0106 5 1.3783 272168
1.5918 0.0213 10 1.3022 547248
1.3917 0.0319 15 1.2334 827136
1.2769 0.0425 20 1.1817 1108760
1.1975 0.0532 25 1.1698 1382240
1.037 0.0638 30 1.1491 1659392
0.9364 0.0744 35 1.1711 1933320
0.8512 0.0851 40 1.1805 2217472
0.8118 0.0957 45 1.1890 2494616
0.7426 0.1063 50 1.1920 2767552
0.687 0.1170 55 1.1935 3030400
0.6747 0.1276 60 1.1881 3301288
0.6189 0.1382 65 1.1822 3574336
0.6121 0.1489 70 1.1785 3843792
0.5065 0.1595 75 1.1724 4118648
0.5733 0.1701 80 1.1710 4387800
0.5961 0.1808 85 1.1766 4659672
0.5097 0.1914 90 1.1727 4933736
0.4812 0.2020 95 1.1689 5213232
0.4241 0.2127 100 1.1730 5484456
0.5009 0.2233 105 1.1617 5759048
0.4416 0.2339 110 1.1703 6035320
0.4452 0.2446 115 1.1592 6306832
0.3983 0.2552 120 1.1651 6575048
0.4051 0.2658 125 1.1574 6846416
0.4605 0.2764 130 1.1602 7119824
0.3852 0.2871 135 1.1570 7399680
0.4569 0.2977 140 1.1494 7679448
0.3371 0.3083 145 1.1536 7948392
0.4216 0.3190 150 1.1492 8221992
0.4162 0.3296 155 1.1495 8497688
0.4242 0.3402 160 1.1470 8769288
0.5207 0.3509 165 1.1482 9040440
0.5184 0.3615 170 1.1438 9303304
0.4073 0.3721 175 1.1446 9579608
0.5278 0.3828 180 1.1419 9852200
0.3397 0.3934 185 1.1405 10120216
0.3696 0.4040 190 1.1374 10399376
0.4079 0.4147 195 1.1387 10669696
0.3999 0.4253 200 1.1354 10945120
0.3623 0.4359 205 1.1349 11217216
0.3865 0.4466 210 1.1345 11490240
0.3609 0.4572 215 1.1319 11764136
0.329 0.4678 220 1.1320 12035936
0.318 0.4785 225 1.1304 12309960
0.3688 0.4891 230 1.1303 12587360
0.3825 0.4997 235 1.1296 12864056
0.3342 0.5104 240 1.1266 13141392
0.3556 0.5210 245 1.1297 13409248
0.3922 0.5316 250 1.1232 13685608
0.2913 0.5423 255 1.1275 13960768
0.2877 0.5529 260 1.1267 14229912
0.3073 0.5635 265 1.1215 14504880
0.3047 0.5742 270 1.1249 14781040
0.3112 0.5848 275 1.1212 15052056
0.3715 0.5954 280 1.1204 15331080
0.3126 0.6061 285 1.1210 15594416
0.2426 0.6167 290 1.1199 15871488
0.3172 0.6273 295 1.1201 16148664
0.3546 0.6380 300 1.1180 16420880
0.3447 0.6486 305 1.1167 16691672
0.3834 0.6592 310 1.1152 16963912
0.3802 0.6699 315 1.1149 17234816
0.4121 0.6805 320 1.1133 17507216
0.3417 0.6911 325 1.1138 17782816
0.3381 0.7018 330 1.1137 18051064
0.3219 0.7124 335 1.1119 18317872
0.3273 0.7230 340 1.1115 18592672
0.382 0.7337 345 1.1110 18868536
0.2966 0.7443 350 1.1109 19141216
0.3398 0.7549 355 1.1137 19414104
0.3522 0.7656 360 1.1101 19690832
0.2731 0.7762 365 1.1126 19964864
0.4028 0.7868 370 1.1089 20238104
0.3434 0.7974 375 1.1078 20510528
0.3365 0.8081 380 1.1091 20788304
0.3795 0.8187 385 1.1099 21068592
0.3514 0.8293 390 1.1061 21347680
0.3104 0.8400 395 1.1073 21620912
0.2955 0.8506 400 1.1061 21895216
0.3423 0.8612 405 1.1049 22169448
0.3246 0.8719 410 1.1072 22443272
0.3157 0.8825 415 1.1059 22717032
0.3253 0.8931 420 1.1058 22985352
0.4123 0.9038 425 1.1068 23257848
0.2308 0.9144 430 1.1055 23530088
0.3211 0.9250 435 1.1055 23802936
0.3404 0.9357 440 1.1038 24081728
0.2566 0.9463 445 1.1033 24356968
0.3221 0.9569 450 1.1028 24630208
0.3999 0.9676 455 1.1022 24903936
0.3544 0.9782 460 1.1022 25182688
0.287 0.9888 465 1.1035 25458936
0.2694 0.9995 470 1.1017 25730952

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
8
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter5_sftsd2

Base model

google/gemma-2-2b
Finetuned
(484)
this model