RylanSchaeffer's picture
End of training
20d943f verified
metadata
license: gemma
base_model: google/gemma-2-2b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: collapse_gemma-2-2b_hs2_replace_iter7_sftsd1
    results: []

collapse_gemma-2-2b_hs2_replace_iter7_sftsd1

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.5502
  • Num Input Tokens Seen: 4945256

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 1
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.4075 0.0511 5 1.2768 262544
0.9592 0.1021 10 1.2490 516672
0.6262 0.1532 15 1.3861 768768
0.4505 0.2042 20 1.5683 1029088
0.2604 0.2553 25 1.7472 1282984
0.1477 0.3063 30 1.9911 1535608
0.072 0.3574 35 2.1880 1790128
0.0485 0.4084 40 2.3094 2042896
0.0376 0.4595 45 2.4429 2298696
0.0293 0.5105 50 2.4744 2551432
0.0301 0.5616 55 2.4918 2814520
0.0241 0.6126 60 2.4981 3062952
0.0233 0.6637 65 2.5132 3321896
0.0241 0.7147 70 2.5177 3582960
0.022 0.7658 75 2.5261 3830336
0.0216 0.8168 80 2.5296 4090456
0.0233 0.8679 85 2.5449 4344200
0.0224 0.9190 90 2.5459 4595224
0.0242 0.9700 95 2.5496 4844232

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1