RylanSchaeffer's picture
End of training
dd53fa4 verified
metadata
license: gemma
base_model: google/gemma-2-2b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: collapse_gemma-2-2b_hs2_accumulate_iter3_sftsd1
    results: []

collapse_gemma-2-2b_hs2_accumulate_iter3_sftsd1

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1371
  • Num Input Tokens Seen: 5264318

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 1
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.4103 0.0538 5 1.2645 285920
1.3042 0.1075 10 1.1766 573248
1.2148 0.1613 15 1.1511 853008
1.1212 0.2151 20 1.1320 1136200
1.0406 0.2688 25 1.1354 1412848
1.1139 0.3226 30 1.1353 1691488
0.93 0.3763 35 1.1526 1969840
0.9491 0.4301 40 1.1464 2249384
0.8255 0.4839 45 1.1531 2529768
0.8226 0.5376 50 1.1457 2813872
0.8505 0.5914 55 1.1510 3105400
0.7052 0.6452 60 1.1498 3391880
0.7749 0.6989 65 1.1413 3678440
0.6941 0.7527 70 1.1457 3959096
0.6859 0.8065 75 1.1410 4248384
0.5947 0.8602 80 1.1435 4534176
0.6197 0.9140 85 1.1393 4811344
0.5752 0.9677 90 1.1362 5094360

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1