jkazdan's picture
End of training
007032b verified
metadata
license: gemma
base_model: google/gemma-2-2b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: collapse_gemma-2-2b_hs2_accumulate_iter3_sftsd0
    results: []

collapse_gemma-2-2b_hs2_accumulate_iter3_sftsd0

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1114
  • Num Input Tokens Seen: 21798600

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 0
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3956 0
1.5454 0.0129 5 1.3798 284096
1.5595 0.0258 10 1.2917 565048
1.4801 0.0388 15 1.2113 857320
1.2583 0.0517 20 1.1640 1143680
1.2536 0.0646 25 1.1396 1426896
1.1682 0.0775 30 1.1244 1704888
1.1565 0.0905 35 1.1242 1985240
1.0138 0.1034 40 1.1384 2269216
0.9845 0.1163 45 1.1461 2554344
0.91 0.1292 50 1.1554 2839272
0.9047 0.1422 55 1.1678 3127496
0.9137 0.1551 60 1.1697 3415328
0.8846 0.1680 65 1.1704 3692024
0.9215 0.1809 70 1.1719 3967168
0.8233 0.1939 75 1.1850 4244568
0.6717 0.2068 80 1.1881 4531936
0.7733 0.2197 85 1.1770 4817232
0.6835 0.2326 90 1.1663 5103112
0.7503 0.2456 95 1.1860 5388248
0.6998 0.2585 100 1.1702 5669656
0.615 0.2714 105 1.1739 5956384
0.5807 0.2843 110 1.1799 6233928
0.6475 0.2973 115 1.1703 6517360
0.649 0.3102 120 1.1702 6802600
0.6409 0.3231 125 1.1747 7086032
0.6033 0.3360 130 1.1629 7364952
0.4875 0.3489 135 1.1752 7650744
0.6259 0.3619 140 1.1664 7933080
0.5287 0.3748 145 1.1703 8220488
0.4745 0.3877 150 1.1645 8501544
0.4469 0.4006 155 1.1667 8781400
0.5011 0.4136 160 1.1652 9056664
0.4512 0.4265 165 1.1630 9337208
0.5347 0.4394 170 1.1630 9620568
0.5226 0.4523 175 1.1626 9896128
0.4775 0.4653 180 1.1568 10176840
0.5018 0.4782 185 1.1642 10461520
0.508 0.4911 190 1.1530 10741632
0.3972 0.5040 195 1.1550 11024096
0.4409 0.5170 200 1.1539 11301736
0.5384 0.5299 205 1.1477 11579816
0.4633 0.5428 210 1.1501 11865648
0.5198 0.5557 215 1.1410 12156088
0.3293 0.5687 220 1.1480 12434448
0.4762 0.5816 225 1.1375 12720344
0.5467 0.5945 230 1.1424 13003704
0.4776 0.6074 235 1.1361 13292824
0.4567 0.6204 240 1.1398 13574560
0.4565 0.6333 245 1.1371 13859632
0.4899 0.6462 250 1.1369 14136888
0.3492 0.6591 255 1.1327 14421200
0.4968 0.6721 260 1.1315 14707344
0.3487 0.6850 265 1.1329 14988680
0.4001 0.6979 270 1.1258 15267688
0.3161 0.7108 275 1.1308 15540888
0.4089 0.7237 280 1.1262 15816840
0.3835 0.7367 285 1.1289 16098568
0.4023 0.7496 290 1.1270 16387224
0.5333 0.7625 295 1.1243 16672848
0.492 0.7754 300 1.1276 16955104
0.3361 0.7884 305 1.1215 17232984
0.4585 0.8013 310 1.1210 17517512
0.3541 0.8142 315 1.1232 17805408
0.4862 0.8271 320 1.1195 18086744
0.5085 0.8401 325 1.1208 18374072
0.4206 0.8530 330 1.1198 18654568
0.3501 0.8659 335 1.1154 18936680
0.4675 0.8788 340 1.1207 19213288
0.3692 0.8918 345 1.1151 19495512
0.3526 0.9047 350 1.1162 19777904
0.5192 0.9176 355 1.1134 20053800
0.5117 0.9305 360 1.1101 20335472
0.3685 0.9435 365 1.1152 20620416
0.3554 0.9564 370 1.1103 20898680
0.4323 0.9693 375 1.1123 21181272
0.4111 0.9822 380 1.1120 21465480
0.3962 0.9952 385 1.1119 21742008

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1