gemma-2-2b_hs2_iter1_sftsd1

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.5298
  • Num Input Tokens Seen: 17258584

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 1
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_steps: 16
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3956 0
1.7151 0.0160 5 1.3649 280984
1.5285 0.0320 10 1.2591 563568
1.4731 0.0480 15 1.1922 850472
1.3042 0.0640 20 1.1590 1126824
1.2126 0.0800 25 1.1400 1403000
1.1899 0.0960 30 1.1529 1677208
0.999 0.1120 35 1.1739 1947376
0.9221 0.1279 40 1.2091 2227616
0.9325 0.1439 45 1.2620 2499960
0.7948 0.1599 50 1.2823 2768464
0.7694 0.1759 55 1.3642 3043800
0.7514 0.1919 60 1.3647 3322280
0.65 0.2079 65 1.4181 3592344
0.6182 0.2239 70 1.4624 3874992
0.5163 0.2399 75 1.4747 4152464
0.5077 0.2559 80 1.4591 4425448
0.4374 0.2719 85 1.5298 4701152
0.3394 0.2879 90 1.5052 4979368
0.3141 0.3039 95 1.5624 5254560
0.3624 0.3199 100 1.5197 5532040
0.2848 0.3359 105 1.5283 5815048
0.3023 0.3519 110 1.4896 6090168
0.2951 0.3679 115 1.5557 6370688
0.2294 0.3838 120 1.5023 6644392
0.1664 0.3998 125 1.5413 6915576
0.1481 0.4158 130 1.5159 7189320
0.196 0.4318 135 1.5542 7463232
0.2218 0.4478 140 1.5711 7740456
0.2089 0.4638 145 1.4790 8014600
0.2167 0.4798 150 1.5068 8293824
0.2483 0.4958 155 1.4646 8564248
0.1578 0.5118 160 1.4834 8836952
0.1617 0.5278 165 1.4932 9115608
0.1366 0.5438 170 1.4936 9389168
0.1628 0.5598 175 1.5136 9669560
0.1186 0.5758 180 1.4843 9942856
0.1343 0.5918 185 1.5045 10216144
0.1022 0.6078 190 1.4646 10489128
0.0853 0.6238 195 1.4684 10769016
0.1577 0.6397 200 1.4865 11047400
0.1552 0.6557 205 1.4296 11328400
0.2221 0.6717 210 1.4768 11602552
0.1426 0.6877 215 1.4382 11881496
0.0724 0.7037 220 1.4639 12166392
0.1345 0.7197 225 1.4186 12443568
0.1385 0.7357 230 1.4526 12722400
0.0969 0.7517 235 1.4532 13005696
0.1123 0.7677 240 1.4317 13287632
0.0852 0.7837 245 1.4869 13566576
0.1228 0.7997 250 1.4233 13843336
0.0919 0.8157 255 1.4676 14123744
0.0924 0.8317 260 1.4920 14403528
0.1393 0.8477 265 1.4665 14681304
0.1063 0.8637 270 1.4547 14956896
0.1158 0.8796 275 1.5160 15230056
0.1062 0.8956 280 1.5117 15505704
0.1193 0.9116 285 1.4500 15776784
0.2646 0.9276 290 1.4688 16050272
0.166 0.9436 295 1.4617 16318920
0.1153 0.9596 300 1.4931 16598920
0.1233 0.9756 305 1.4729 16875936
0.0626 0.9916 310 1.4923 17148640

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
4
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for jkazdan/gemma-2-2b_hs2_iter1_sftsd1

Base model

google/gemma-2-2b
Finetuned
(471)
this model