RylanSchaeffer's picture
End of training
9a277ca verified
metadata
license: gemma
base_model: google/gemma-2-2b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: collapse_gemma-2-2b_hs2_replace_iter9_sftsd0
    results: []

collapse_gemma-2-2b_hs2_replace_iter9_sftsd0

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.5305
  • Num Input Tokens Seen: 4805008

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 0
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.5231 0.0513 5 1.2790 249840
0.9028 0.1027 10 1.2972 494032
0.5406 0.1540 15 1.5467 747896
0.2367 0.2054 20 1.8042 994456
0.1891 0.2567 25 1.9924 1238888
0.0891 0.3081 30 2.1582 1483192
0.0613 0.3594 35 2.3303 1726032
0.0361 0.4108 40 2.4317 1973864
0.0255 0.4621 45 2.4696 2224064
0.0251 0.5135 50 2.5037 2481064
0.0244 0.5648 55 2.5279 2724856
0.0234 0.6162 60 2.5367 2979392
0.0255 0.6675 65 2.5210 3223656
0.0291 0.7189 70 2.5165 3468936
0.0237 0.7702 75 2.4977 3711296
0.0233 0.8216 80 2.4937 3960920
0.0217 0.8729 85 2.5052 4202464
0.0228 0.9243 90 2.5141 4452272
0.0221 0.9756 95 2.5258 4700624

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1