collapse_gemma-2-2b_hs2_replace_iter4_sftsd1

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.1795
  • Num Input Tokens Seen: 5060896

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 1
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.4673 0.0513 5 1.2702 264264
1.0669 0.1026 10 1.2304 526720
0.7222 0.1539 15 1.3963 784504
0.5116 0.2053 20 1.5787 1043032
0.2412 0.2566 25 1.7312 1300648
0.1173 0.3079 30 1.9555 1556464
0.0997 0.3592 35 2.0520 1819864
0.0682 0.4105 40 2.2350 2079728
0.0725 0.4618 45 2.2192 2342280
0.0317 0.5131 50 2.1794 2602208
0.0312 0.5645 55 2.1793 2868280
0.0593 0.6158 60 2.1587 3132504
0.0364 0.6671 65 2.1674 3394504
0.0304 0.7184 70 2.1789 3655584
0.0329 0.7697 75 2.1921 3911664
0.0333 0.8210 80 2.1670 4171336
0.0283 0.8724 85 2.1704 4428472
0.0297 0.9237 90 2.1697 4685656
0.0241 0.9750 95 2.1790 4955984

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
1
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_replace_iter4_sftsd1

Base model

google/gemma-2-2b
Finetuned
(471)
this model