RylanSchaeffer's picture
End of training
2b2b2b4 verified
metadata
license: gemma
base_model: google/gemma-2-2b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: collapse_gemma-2-2b_hs2_accumulatesubsample_iter7_sftsd2
    results: []

collapse_gemma-2-2b_hs2_accumulatesubsample_iter7_sftsd2

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1838
  • Num Input Tokens Seen: 5084280

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.3084 0.0532 5 1.2703 275232
1.1373 0.1065 10 1.1957 547312
1.1446 0.1597 15 1.1790 822336
0.9553 0.2129 20 1.1806 1090224
0.8531 0.2661 25 1.2067 1364032
0.7999 0.3194 30 1.2086 1638152
0.8383 0.3726 35 1.2081 1910656
0.6788 0.4258 40 1.2046 2184752
0.5638 0.4790 45 1.2050 2460296
0.7359 0.5323 50 1.1890 2726344
0.5884 0.5855 55 1.2006 2997416
0.5682 0.6387 60 1.1961 3277152
0.5166 0.6919 65 1.1880 3552712
0.6191 0.7452 70 1.1862 3828400
0.4679 0.7984 75 1.1922 4104744
0.5175 0.8516 80 1.1861 4374456
0.4754 0.9049 85 1.1885 4650432
0.489 0.9581 90 1.1839 4922224

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1