RylanSchaeffer's picture
End of training
8800e0e verified
metadata
license: gemma
base_model: google/gemma-2-9b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: collapse_gemma-2-9b_hs2_accumulate_iter2_sftsd0
    results: []

collapse_gemma-2-9b_hs2_accumulate_iter2_sftsd0

This model is a fine-tuned version of google/gemma-2-9b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.9438
  • Num Input Tokens Seen: 9944616

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 4
  • eval_batch_size: 16
  • seed: 0
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.2335 0
1.1198 0.0263 5 1.1072 256304
1.0654 0.0527 10 1.0185 519116
0.8862 0.0790 15 0.9889 775168
0.8666 0.1054 20 0.9891 1038920
0.7782 0.1317 25 0.9886 1306500
0.6537 0.1581 30 0.9872 1568200
0.7345 0.1844 35 0.9877 1831700
0.6292 0.2107 40 0.9795 2092712
0.6696 0.2371 45 0.9755 2353476
0.5445 0.2634 50 0.9722 2620524
0.6364 0.2898 55 0.9687 2886160
0.6564 0.3161 60 0.9671 3149304
0.5167 0.3424 65 0.9640 3413380
0.6553 0.3688 70 0.9627 3684636
0.5201 0.3951 75 0.9603 3947600
0.5839 0.4215 80 0.9603 4207528
0.5599 0.4478 85 0.9587 4468996
0.6981 0.4742 90 0.9590 4730728
0.582 0.5005 95 0.9558 4991328
0.5174 0.5268 100 0.9556 5253436
0.6031 0.5532 105 0.9545 5518624
0.6314 0.5795 110 0.9528 5780988
0.4925 0.6059 115 0.9527 6041796
0.5823 0.6322 120 0.9515 6307948
0.5974 0.6585 125 0.9498 6573748
0.4411 0.6849 130 0.9492 6836544
0.4604 0.7112 135 0.9489 7098504
0.564 0.7376 140 0.9475 7354740
0.5769 0.7639 145 0.9477 7620140
0.4886 0.7903 150 0.9468 7884420
0.5637 0.8166 155 0.9462 8151036
0.5161 0.8429 160 0.9460 8414540
0.633 0.8693 165 0.9459 8677992
0.5239 0.8956 170 0.9446 8937256
0.6149 0.9220 175 0.9465 9204996
0.5386 0.9483 180 0.9451 9467132
0.6638 0.9746 185 0.9446 9732120

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1