collapse_gemma-2-2b_hs2_accumulatesubsample_iter6_sftsd0

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1801
  • Num Input Tokens Seen: 5072024

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 0
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.3996 0.0547 5 1.2737 278864
1.2739 0.1094 10 1.1914 559168
1.0342 0.1640 15 1.1756 843088
0.9859 0.2187 20 1.1705 1120864
0.8532 0.2734 25 1.1868 1389432
0.886 0.3281 30 1.1878 1659616
0.8437 0.3828 35 1.1879 1944408
0.8296 0.4375 40 1.1998 2221448
0.6965 0.4921 45 1.2044 2496880
0.7313 0.5468 50 1.1847 2774592
0.654 0.6015 55 1.1892 3058288
0.6299 0.6562 60 1.1958 3340632
0.5727 0.7109 65 1.1848 3619656
0.5546 0.7656 70 1.1821 3898496
0.632 0.8202 75 1.1899 4177160
0.5853 0.8749 80 1.1794 4460344
0.5044 0.9296 85 1.1827 4736688
0.5797 0.9843 90 1.1790 5018304

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
1
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulatesubsample_iter6_sftsd0

Base model

google/gemma-2-2b
Finetuned
(471)
this model