gemma-2-2b_hs2_iter1_sftsd2

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.4430
  • Num Input Tokens Seen: 17305888

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_steps: 16
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3956 0
1.7634 0.0160 5 1.3647 280680
1.7058 0.0320 10 1.2551 560560
1.4737 0.0480 15 1.1867 829000
1.346 0.0640 20 1.1565 1107160
1.2651 0.0800 25 1.1401 1382752
1.1926 0.0960 30 1.1571 1661728
1.0951 0.1120 35 1.1751 1933304
1.0001 0.1279 40 1.2205 2214264
0.9762 0.1439 45 1.2602 2489024
0.8297 0.1599 50 1.3327 2764992
0.7969 0.1759 55 1.3682 3039920
0.8151 0.1919 60 1.3863 3315424
0.6221 0.2079 65 1.4445 3587920
0.5957 0.2239 70 1.4630 3876656
0.4842 0.2399 75 1.4861 4153720
0.4818 0.2559 80 1.4824 4429368
0.45 0.2719 85 1.5948 4708392
0.4573 0.2879 90 1.4911 4989712
0.4216 0.3039 95 1.5597 5272344
0.3548 0.3199 100 1.5243 5546808
0.3257 0.3359 105 1.5387 5823112
0.3723 0.3519 110 1.5167 6109528
0.2783 0.3679 115 1.5226 6386720
0.1892 0.3838 120 1.5139 6664328
0.2645 0.3998 125 1.5059 6941176
0.1636 0.4158 130 1.5091 7222536
0.202 0.4318 135 1.5481 7494936
0.2311 0.4478 140 1.4857 7770984
0.2528 0.4638 145 1.4971 8055360
0.2558 0.4798 150 1.4835 8330712
0.1999 0.4958 155 1.4816 8613280
0.1584 0.5118 160 1.4518 8891640
0.1637 0.5278 165 1.4738 9170232
0.1785 0.5438 170 1.4616 9443744
0.172 0.5598 175 1.4296 9719752
0.1687 0.5758 180 1.4798 9993896
0.1333 0.5918 185 1.4364 10276328
0.1173 0.6078 190 1.5083 10554248
0.118 0.6238 195 1.4917 10836392
0.1599 0.6397 200 1.4452 11112312
0.2224 0.6557 205 1.4793 11389776
0.1497 0.6717 210 1.4294 11662248
0.1591 0.6877 215 1.4589 11930472
0.1778 0.7037 220 1.4534 12205904
0.1652 0.7197 225 1.4452 12479536
0.1618 0.7357 230 1.4894 12761120
0.153 0.7517 235 1.4536 13028616
0.0795 0.7677 240 1.4597 13300744
0.1222 0.7837 245 1.4621 13577992
0.1454 0.7997 250 1.4310 13858896
0.1635 0.8157 255 1.4786 14135016
0.1454 0.8317 260 1.4677 14412744
0.0808 0.8477 265 1.4608 14696120
0.1334 0.8637 270 1.4460 14965904
0.1086 0.8796 275 1.4609 15250560
0.1077 0.8956 280 1.4766 15527232
0.1172 0.9116 285 1.4532 15807240
0.1097 0.9276 290 1.4706 16085560
0.1058 0.9436 295 1.4791 16364832
0.0922 0.9596 300 1.4987 16644056
0.1252 0.9756 305 1.4820 16920032
0.1657 0.9916 310 1.4333 17199096

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
1
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for jkazdan/gemma-2-2b_hs2_iter1_sftsd2

Base model

google/gemma-2-2b
Finetuned
(471)
this model