RylanSchaeffer's picture
End of training
b924960 verified
metadata
license: gemma
base_model: google/gemma-2-9b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: collapse_gemma-2-9b_hs2_accumulate_iter2_sftsd2
    results: []

collapse_gemma-2-9b_hs2_accumulate_iter2_sftsd2

This model is a fine-tuned version of google/gemma-2-9b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.9422
  • Num Input Tokens Seen: 9681676

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 4
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.2335 0
1.2828 0.0271 5 1.1073 265560
1.0134 0.0542 10 1.0202 534484
1.033 0.0814 15 0.9863 798368
0.8674 0.1085 20 0.9835 1059736
0.8101 0.1356 25 0.9846 1317664
0.7184 0.1627 30 0.9897 1578212
0.7122 0.1899 35 0.9834 1838956
0.7129 0.2170 40 0.9781 2102368
0.643 0.2441 45 0.9751 2365072
0.6169 0.2712 50 0.9738 2626376
0.7176 0.2984 55 0.9700 2886524
0.5972 0.3255 60 0.9665 3149448
0.573 0.3526 65 0.9639 3415664
0.6035 0.3797 70 0.9629 3676764
0.6096 0.4068 75 0.9598 3940104
0.5832 0.4340 80 0.9585 4204740
0.6262 0.4611 85 0.9572 4467556
0.6814 0.4882 90 0.9555 4731864
0.6672 0.5153 95 0.9533 4997040
0.5181 0.5425 100 0.9519 5263636
0.5759 0.5696 105 0.9515 5527476
0.597 0.5967 110 0.9507 5790324
0.5898 0.6238 115 0.9501 6054116
0.6857 0.6510 120 0.9496 6313184
0.5666 0.6781 125 0.9490 6573064
0.5007 0.7052 130 0.9491 6839704
0.5295 0.7323 135 0.9473 7101760
0.5782 0.7595 140 0.9458 7359916
0.5476 0.7866 145 0.9456 7629448
0.5752 0.8137 150 0.9457 7888404
0.48 0.8408 155 0.9444 8151276
0.6858 0.8679 160 0.9448 8410464
0.569 0.8951 165 0.9454 8677664
0.5906 0.9222 170 0.9441 8944556
0.5673 0.9493 175 0.9441 9201860
0.6069 0.9764 180 0.9446 9467072

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1