RylanSchaeffer's picture
End of training
b9ab131 verified
metadata
license: gemma
base_model: google/gemma-2-2b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: collapse_gemma-2-2b_hs2_accumulatesubsample_iter6_sftsd1
    results: []

collapse_gemma-2-2b_hs2_accumulatesubsample_iter6_sftsd1

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1630
  • Num Input Tokens Seen: 5028016

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 1
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.4282 0.0533 5 1.2707 272080
1.0872 0.1065 10 1.1923 534176
1.0069 0.1598 15 1.1791 810424
0.9708 0.2130 20 1.1779 1085464
0.841 0.2663 25 1.1966 1354360
0.7559 0.3196 30 1.2040 1626064
0.726 0.3728 35 1.1940 1897496
0.7034 0.4261 40 1.1953 2174648
0.5682 0.4794 45 1.1947 2445704
0.575 0.5326 50 1.1886 2714920
0.566 0.5859 55 1.1807 2982200
0.5243 0.6391 60 1.1784 3246752
0.5905 0.6924 65 1.1718 3518224
0.473 0.7457 70 1.1766 3783208
0.5029 0.7989 75 1.1662 4047576
0.5819 0.8522 80 1.1747 4321368
0.5147 0.9055 85 1.1620 4594208
0.4796 0.9587 90 1.1722 4862792

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1