RylanSchaeffer's picture
End of training
693970c verified
metadata
license: gemma
base_model: google/gemma-2-2b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: collapse_gemma-2-2b_hs2_accumulatesubsample_iter14_sftsd2
    results: []

collapse_gemma-2-2b_hs2_accumulatesubsample_iter14_sftsd2

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.2046
  • Num Input Tokens Seen: 4998392

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.4664 0.0531 5 1.2779 265768
1.0297 0.1062 10 1.2239 526776
0.9672 0.1594 15 1.2051 794288
0.9285 0.2125 20 1.2391 1063824
0.7632 0.2656 25 1.2306 1332408
0.7406 0.3187 30 1.2478 1595464
0.6883 0.3718 35 1.2507 1871024
0.5929 0.4250 40 1.2429 2133560
0.4589 0.4781 45 1.2391 2394480
0.6095 0.5312 50 1.2221 2663544
0.5181 0.5843 55 1.2246 2930064
0.4917 0.6375 60 1.2135 3199536
0.5105 0.6906 65 1.2249 3465264
0.4253 0.7437 70 1.2138 3727952
0.4506 0.7968 75 1.2148 3991304
0.4301 0.8499 80 1.2095 4255664
0.432 0.9031 85 1.2015 4523456
0.3698 0.9562 90 1.2208 4781552

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1