collapse_gemma-2-2b_hs2_accumulatesubsample_iter8_sftsd1

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1796
  • Num Input Tokens Seen: 5038648

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 1
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.381 0.0531 5 1.2701 271832
1.1988 0.1062 10 1.1934 544104
1.0069 0.1594 15 1.1852 802480
0.86 0.2125 20 1.1991 1070600
0.8632 0.2656 25 1.2041 1347848
0.7896 0.3187 30 1.2265 1614736
0.6294 0.3718 35 1.2269 1874656
0.5769 0.4250 40 1.2244 2140272
0.4774 0.4781 45 1.2120 2408424
0.5257 0.5312 50 1.2074 2680984
0.501 0.5843 55 1.2010 2949464
0.4729 0.6375 60 1.1885 3214888
0.4757 0.6906 65 1.1828 3485080
0.4514 0.7437 70 1.1845 3751008
0.4081 0.7968 75 1.1793 4016424
0.4307 0.8499 80 1.1869 4289272
0.4335 0.9031 85 1.1811 4558712
0.3815 0.9562 90 1.1835 4822832

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
1
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulatesubsample_iter8_sftsd1

Base model

google/gemma-2-2b
Finetuned
(484)
this model