collapse_gemma-2-2b_hs2_accumulatesubsample_iter7_sftsd2

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1838
  • Num Input Tokens Seen: 5084280

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.3084 0.0532 5 1.2703 275232
1.1373 0.1065 10 1.1957 547312
1.1446 0.1597 15 1.1790 822336
0.9553 0.2129 20 1.1806 1090224
0.8531 0.2661 25 1.2067 1364032
0.7999 0.3194 30 1.2086 1638152
0.8383 0.3726 35 1.2081 1910656
0.6788 0.4258 40 1.2046 2184752
0.5638 0.4790 45 1.2050 2460296
0.7359 0.5323 50 1.1890 2726344
0.5884 0.5855 55 1.2006 2997416
0.5682 0.6387 60 1.1961 3277152
0.5166 0.6919 65 1.1880 3552712
0.6191 0.7452 70 1.1862 3828400
0.4679 0.7984 75 1.1922 4104744
0.5175 0.8516 80 1.1861 4374456
0.4754 0.9049 85 1.1885 4650432
0.489 0.9581 90 1.1839 4922224

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
1
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulatesubsample_iter7_sftsd2

Base model

google/gemma-2-2b
Finetuned
(471)
this model