RylanSchaeffer's picture
End of training
2ad03c2 verified
metadata
license: gemma
base_model: google/gemma-2-2b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: collapse_gemma-2-2b_hs2_replace_iter9_sftsd1
    results: []

collapse_gemma-2-2b_hs2_replace_iter9_sftsd1

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.6202
  • Num Input Tokens Seen: 4782384

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 1
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.4251 0.0511 5 1.2773 251936
1.0412 0.1021 10 1.2473 494776
0.6073 0.1532 15 1.4155 739080
0.4079 0.2042 20 1.6045 992504
0.2094 0.2553 25 1.8387 1237192
0.1016 0.3063 30 2.1297 1481728
0.0516 0.3574 35 2.2672 1730840
0.0373 0.4084 40 2.3948 1976944
0.0293 0.4595 45 2.4808 2220288
0.0264 0.5105 50 2.5189 2467184
0.0285 0.5616 55 2.5581 2721304
0.0236 0.6126 60 2.5681 2961768
0.0228 0.6637 65 2.5784 3208208
0.0235 0.7147 70 2.5833 3462120
0.0239 0.7658 75 2.5890 3702984
0.023 0.8168 80 2.6044 3955448
0.0233 0.8679 85 2.6159 4205584
0.0226 0.9190 90 2.6276 4445256
0.0246 0.9700 95 2.6255 4684392

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1