RylanSchaeffer's picture
End of training
1b3dc51 verified
metadata
license: gemma
base_model: google/gemma-2-2b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: collapse_gemma-2-2b_hs2_replace_iter6_sftsd2
    results: []

collapse_gemma-2-2b_hs2_replace_iter6_sftsd2

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.3509
  • Num Input Tokens Seen: 4891248

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.6111 0.0511 5 1.2746 256472
0.9464 0.1022 10 1.2781 505824
0.5045 0.1534 15 1.5132 755440
0.2581 0.2045 20 1.7434 1005848
0.1646 0.2556 25 1.9292 1255320
0.0521 0.3067 30 2.1045 1503448
0.0656 0.3578 35 2.2305 1753944
0.0534 0.4089 40 2.3242 2008656
0.0296 0.4601 45 2.3765 2257912
0.0334 0.5112 50 2.3208 2511848
0.0325 0.5623 55 2.2550 2768424
0.0526 0.6134 60 2.2720 3026208
0.0252 0.6645 65 2.2846 3281624
0.0247 0.7157 70 2.2932 3536472
0.0241 0.7668 75 2.3163 3787728
0.0238 0.8179 80 2.3276 4041280
0.0228 0.8690 85 2.3375 4291440
0.0234 0.9201 90 2.3306 4542072
0.0243 0.9712 95 2.3427 4793768

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1