RylanSchaeffer's picture
End of training
41970a5 verified
metadata
license: gemma
base_model: google/gemma-2-2b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: collapse_gemma-2-2b_hs2_replace_iter5_sftsd2
    results: []

collapse_gemma-2-2b_hs2_replace_iter5_sftsd2

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.2408
  • Num Input Tokens Seen: 4919592

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.7001 0.0513 5 1.2745 250576
0.9776 0.1026 10 1.2553 507984
0.6411 0.1539 15 1.4389 759912
0.3549 0.2053 20 1.6506 1017344
0.1684 0.2566 25 1.8128 1261720
0.0927 0.3079 30 1.9916 1509936
0.088 0.3592 35 2.1525 1762648
0.0417 0.4105 40 2.2521 2020112
0.0409 0.4618 45 2.2578 2273928
0.0342 0.5131 50 2.2295 2525056
0.0366 0.5645 55 2.2589 2779656
0.0259 0.6158 60 2.2810 3029816
0.0289 0.6671 65 2.2621 3284000
0.0332 0.7184 70 2.2593 3542064
0.0288 0.7697 75 2.2449 3801936
0.0246 0.8210 80 2.2357 4058824
0.025 0.8724 85 2.2324 4315048
0.0273 0.9237 90 2.2358 4563000
0.0226 0.9750 95 2.2411 4812984

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1