collapse_gemma-2-2b_hs2_accumulatesubsample_iter6_sftsd1

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1630
  • Num Input Tokens Seen: 5028016

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 1
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.4282 0.0533 5 1.2707 272080
1.0872 0.1065 10 1.1923 534176
1.0069 0.1598 15 1.1791 810424
0.9708 0.2130 20 1.1779 1085464
0.841 0.2663 25 1.1966 1354360
0.7559 0.3196 30 1.2040 1626064
0.726 0.3728 35 1.1940 1897496
0.7034 0.4261 40 1.1953 2174648
0.5682 0.4794 45 1.1947 2445704
0.575 0.5326 50 1.1886 2714920
0.566 0.5859 55 1.1807 2982200
0.5243 0.6391 60 1.1784 3246752
0.5905 0.6924 65 1.1718 3518224
0.473 0.7457 70 1.1766 3783208
0.5029 0.7989 75 1.1662 4047576
0.5819 0.8522 80 1.1747 4321368
0.5147 0.9055 85 1.1620 4594208
0.4796 0.9587 90 1.1722 4862792

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
3
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulatesubsample_iter6_sftsd1

Base model

google/gemma-2-2b
Finetuned
(471)
this model