RylanSchaeffer's picture
End of training
e65d56b verified
metadata
license: gemma
base_model: google/gemma-2-2b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: collapse_gemma-2-2b_hs2_accumulatesubsample_iter19_sftsd1
    results: []

collapse_gemma-2-2b_hs2_accumulatesubsample_iter19_sftsd1

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.2055
  • Num Input Tokens Seen: 4907024

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 1
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.3427 0.0527 5 1.2782 258072
1.0971 0.1053 10 1.2131 521696
0.9209 0.1580 15 1.2167 782872
0.7304 0.2107 20 1.2697 1039040
0.6214 0.2633 25 1.2589 1307632
0.5449 0.3160 30 1.3018 1568000
0.521 0.3687 35 1.2918 1824608
0.4267 0.4213 40 1.2783 2087280
0.4484 0.4740 45 1.2457 2348744
0.403 0.5267 50 1.2346 2610176
0.3899 0.5793 55 1.2224 2873528
0.3705 0.6320 60 1.2227 3133328
0.3662 0.6847 65 1.2187 3395112
0.3322 0.7373 70 1.2076 3656104
0.3614 0.7900 75 1.2070 3917544
0.3462 0.8427 80 1.2021 4174120
0.3258 0.8953 85 1.2061 4437136
0.3069 0.9480 90 1.2061 4699512

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1