collapse_gemma-2-2b_hs2_accumulatesubsample_iter9_sftsd1

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1837
  • Num Input Tokens Seen: 5013888

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 1
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.4549 0.0530 5 1.2734 272264
1.0822 0.1060 10 1.1956 535104
0.9981 0.1590 15 1.1870 799192
0.8748 0.2121 20 1.2126 1064056
0.8208 0.2651 25 1.2219 1334048
0.7611 0.3181 30 1.2282 1604648
0.6888 0.3711 35 1.2310 1865896
0.5709 0.4241 40 1.2214 2142528
0.5934 0.4771 45 1.2378 2406648
0.532 0.5302 50 1.1954 2673848
0.557 0.5832 55 1.2070 2936776
0.4641 0.6362 60 1.2062 3194848
0.4939 0.6892 65 1.1957 3464616
0.3887 0.7422 70 1.1948 3732256
0.4909 0.7952 75 1.1860 4002936
0.4297 0.8482 80 1.1849 4272664
0.3908 0.9013 85 1.1861 4538816
0.3598 0.9543 90 1.1830 4797920

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
1
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulatesubsample_iter9_sftsd1

Base model

google/gemma-2-2b
Finetuned
(471)
this model