collapse_gemma-2-2b_hs2_accumulate_iter2_sftsd0

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.0884
  • Num Input Tokens Seen: 10631280

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 0
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.4986 0.0274 5 1.3330 291568
1.3182 0.0548 10 1.2111 587448
1.2698 0.0822 15 1.1561 878712
1.1636 0.1096 20 1.1285 1172912
1.1254 0.1370 25 1.1113 1462432
1.1388 0.1644 30 1.1125 1754352
1.0632 0.1918 35 1.1148 2044296
1.0854 0.2193 40 1.1123 2336344
1.0012 0.2467 45 1.1118 2629112
0.9763 0.2741 50 1.1233 2922992
0.8928 0.3015 55 1.1148 3212144
0.9294 0.3289 60 1.1208 3498808
0.9218 0.3563 65 1.1160 3790240
0.8805 0.3837 70 1.1220 4084176
0.8095 0.4111 75 1.1249 4369920
0.8382 0.4385 80 1.1195 4666480
0.8528 0.4659 85 1.1163 4959872
0.8016 0.4933 90 1.1147 5254800
0.8473 0.5207 95 1.1142 5546992
0.7947 0.5481 100 1.1122 5834416
0.7363 0.5755 105 1.1072 6127320
0.6941 0.6029 110 1.1062 6426288
0.7032 0.6304 115 1.1080 6714832
0.73 0.6578 120 1.1044 7008720
0.6667 0.6852 125 1.1017 7302184
0.6676 0.7126 130 1.1011 7596152
0.7638 0.7400 135 1.0994 7884552
0.7206 0.7674 140 1.0979 8179512
0.7141 0.7948 145 1.0960 8470208
0.7504 0.8222 150 1.0947 8761968
0.6988 0.8496 155 1.0930 9055184
0.7438 0.8770 160 1.0927 9343128
0.667 0.9044 165 1.0902 9637976
0.7389 0.9318 170 1.0913 9930512
0.7248 0.9592 175 1.0880 10226368
0.7772 0.9866 180 1.0892 10513336

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
1
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter2_sftsd0

Base model

google/gemma-2-2b
Finetuned
(484)
this model