jkazdan's picture
End of training
7d50402 verified
|
raw
history blame
3.75 kB
metadata
license: gemma
base_model: google/gemma-2-2b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: gemma-2-2b_hs2_iter1_sftsd1
    results: []

gemma-2-2b_hs2_iter1_sftsd1

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.4676
  • Num Input Tokens Seen: 8712000

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 1
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_steps: 16
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3956 0
1.7677 0.0320 5 1.3656 271360
1.5854 0.0639 10 1.2596 545792
1.4739 0.0959 15 1.1954 825128
1.2857 0.1279 20 1.1738 1108408
1.184 0.1599 25 1.1751 1388256
0.9207 0.1918 30 1.2445 1667528
0.8665 0.2238 35 1.2921 1949712
0.7163 0.2558 40 1.4105 2223872
0.5853 0.2878 45 1.4211 2497184
0.5139 0.3197 50 1.5440 2777320
0.4299 0.3517 55 1.5069 3057528
0.3458 0.3837 60 1.5679 3331488
0.2913 0.4157 65 1.5084 3611304
0.2654 0.4476 70 1.5051 3897800
0.2249 0.4796 75 1.5396 4174176
0.3085 0.5116 80 1.5069 4450528
0.1601 0.5436 85 1.5507 4732680
0.1126 0.5755 90 1.4520 5015288
0.1922 0.6075 95 1.4548 5296336
0.1709 0.6395 100 1.4422 5578672
0.1558 0.6715 105 1.4477 5860328
0.0981 0.7034 110 1.4791 6141568
0.1635 0.7354 115 1.4351 6424480
0.1061 0.7674 120 1.4498 6706048
0.159 0.7994 125 1.4220 6990040
0.0759 0.8313 130 1.4819 7264776
0.0897 0.8633 135 1.4187 7543304
0.1316 0.8953 140 1.4371 7823792
0.1955 0.9273 145 1.4277 8101728
0.1215 0.9592 150 1.4345 8378768
0.1174 0.9912 155 1.4765 8659448

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1