RylanSchaeffer's picture
End of training
596e0f4 verified
metadata
license: gemma
base_model: google/gemma-2-2b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: collapse_gemma-2-2b_hs2_accumulatesubsample_iter16_sftsd2
    results: []

collapse_gemma-2-2b_hs2_accumulatesubsample_iter16_sftsd2

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.2037
  • Num Input Tokens Seen: 5033336

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.4057 0.0531 5 1.2789 266712
0.9946 0.1061 10 1.2203 535376
0.9751 0.1592 15 1.2176 817176
0.8049 0.2122 20 1.2373 1083600
0.7624 0.2653 25 1.2358 1352608
0.7157 0.3183 30 1.2521 1622152
0.54 0.3714 35 1.2346 1882312
0.5442 0.4244 40 1.2433 2149600
0.5808 0.4775 45 1.2429 2416240
0.4783 0.5305 50 1.2305 2682968
0.5364 0.5836 55 1.2256 2950376
0.5619 0.6366 60 1.2167 3214352
0.5027 0.6897 65 1.2278 3481120
0.4447 0.7427 70 1.2205 3747064
0.3629 0.7958 75 1.2205 4015440
0.5072 0.8488 80 1.2094 4281048
0.5246 0.9019 85 1.2102 4550336
0.5123 0.9549 90 1.2077 4814152

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1