RylanSchaeffer's picture
End of training
9cb96d9 verified
metadata
license: gemma
base_model: google/gemma-2-2b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: collapse_gemma-2-2b_hs2_accumulatesubsample_iter19_sftsd2
    results: []

collapse_gemma-2-2b_hs2_accumulatesubsample_iter19_sftsd2

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.2160
  • Num Input Tokens Seen: 4969888

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.3282 0.0529 5 1.2782 268552
1.0606 0.1058 10 1.2285 533864
0.9673 0.1587 15 1.2222 799192
0.7577 0.2116 20 1.2580 1065712
0.7055 0.2646 25 1.2578 1334136
0.6601 0.3175 30 1.2654 1600744
0.5988 0.3704 35 1.2742 1865248
0.5391 0.4233 40 1.2674 2126184
0.5215 0.4762 45 1.2479 2389800
0.4847 0.5291 50 1.2539 2652896
0.3997 0.5820 55 1.2492 2917336
0.4981 0.6349 60 1.2381 3182592
0.422 0.6878 65 1.2312 3444800
0.4256 0.7407 70 1.2293 3706456
0.3611 0.7937 75 1.2366 3968992
0.4669 0.8466 80 1.2204 4236704
0.3871 0.8995 85 1.2243 4494952
0.4819 0.9524 90 1.2215 4752080

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1