collapse_gemma-2-2b_hs2_accumulatesubsample_iter6_sftsd2

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1833
  • Num Input Tokens Seen: 5086248

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.3794 0.0534 5 1.2685 268544
1.1344 0.1068 10 1.1908 540984
1.092 0.1602 15 1.1709 814280
0.9745 0.2136 20 1.1742 1088888
0.8742 0.2670 25 1.1718 1360592
0.9279 0.3204 30 1.1893 1633024
0.8757 0.3738 35 1.1800 1905464
0.7368 0.4272 40 1.2066 2182616
0.7263 0.4806 45 1.1794 2457160
0.5811 0.5340 50 1.1940 2735040
0.5781 0.5874 55 1.1842 3007976
0.6488 0.6409 60 1.1876 3283704
0.6015 0.6943 65 1.1807 3548216
0.6332 0.7477 70 1.1787 3816768
0.638 0.8011 75 1.1893 4083896
0.6347 0.8545 80 1.1804 4371088
0.5831 0.9079 85 1.1794 4646192
0.5994 0.9613 90 1.1799 4922280

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
1
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulatesubsample_iter6_sftsd2

Base model

google/gemma-2-2b
Finetuned
(471)
this model