gemma-2-2b_hs2_iter1_sftsd0

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.5137
  • Num Input Tokens Seen: 17829712

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 0
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_steps: 16
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3956 0
1.6926 0.0160 5 1.3651 285696
1.5784 0.0320 10 1.2560 571560
1.4982 0.0480 15 1.1924 856528
1.3011 0.0640 20 1.1594 1137968
1.2692 0.0800 25 1.1378 1423392
1.2069 0.0960 30 1.1500 1706944
1.1563 0.1120 35 1.1761 1988224
1.0316 0.1279 40 1.2207 2272264
0.9047 0.1439 45 1.2716 2559864
0.8926 0.1599 50 1.3145 2846920
0.7537 0.1759 55 1.3610 3135896
0.7882 0.1919 60 1.4222 3418728
0.6266 0.2079 65 1.4826 3699056
0.5966 0.2239 70 1.5111 3982712
0.5862 0.2399 75 1.5479 4266016
0.4099 0.2559 80 1.5246 4545624
0.438 0.2719 85 1.5312 4834416
0.4268 0.2879 90 1.5651 5115616
0.3835 0.3039 95 1.5781 5404872
0.3936 0.3199 100 1.6049 5693440
0.2999 0.3359 105 1.5558 5979936
0.3388 0.3519 110 1.5853 6265272
0.2141 0.3679 115 1.6082 6550008
0.1951 0.3838 120 1.5357 6829896
0.2827 0.3998 125 1.5383 7119640
0.1915 0.4158 130 1.5876 7401968
0.1656 0.4318 135 1.5285 7693464
0.1482 0.4478 140 1.5381 7979480
0.1831 0.4638 145 1.5497 8273408
0.2056 0.4798 150 1.5419 8564664
0.1866 0.4958 155 1.5257 8852896
0.1868 0.5118 160 1.5287 9138384
0.0985 0.5278 165 1.4843 9419648
0.1397 0.5438 170 1.4939 9704104
0.1592 0.5598 175 1.4628 9987840
0.1712 0.5758 180 1.4940 10272800
0.1482 0.5918 185 1.4714 10556720
0.0878 0.6078 190 1.4612 10842864
0.1269 0.6238 195 1.4885 11129280
0.0927 0.6397 200 1.4619 11410784
0.1429 0.6557 205 1.4507 11694648
0.1545 0.6717 210 1.4523 11981880
0.1168 0.6877 215 1.4535 12272496
0.175 0.7037 220 1.4501 12558896
0.0869 0.7197 225 1.4673 12842440
0.1086 0.7357 230 1.4905 13130608
0.1035 0.7517 235 1.4422 13411360
0.1142 0.7677 240 1.4519 13695520
0.091 0.7837 245 1.4698 13980728
0.1734 0.7997 250 1.4578 14276136
0.147 0.8157 255 1.4818 14560480
0.1138 0.8317 260 1.4677 14848512
0.0635 0.8477 265 1.4703 15136488
0.2047 0.8637 270 1.4876 15423352
0.1162 0.8796 275 1.4672 15707888
0.1132 0.8956 280 1.4634 15990288
0.1231 0.9116 285 1.4662 16275832
0.1544 0.9276 290 1.5047 16564968
0.1852 0.9436 295 1.4825 16851368
0.1406 0.9596 300 1.4831 17142256
0.1188 0.9756 305 1.5429 17429064
0.1442 0.9916 310 1.5211 17714264

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
1
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for jkazdan/gemma-2-2b_hs2_iter1_sftsd0

Base model

google/gemma-2-2b
Finetuned
(471)
this model