collapse_gemma-2-2b_hs2_accumulatesubsample_iter13_sftsd2

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.2080
  • Num Input Tokens Seen: 5026384

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.3546 0.0532 5 1.2770 264160
1.0602 0.1063 10 1.2137 532584
0.9619 0.1595 15 1.2117 800536
0.8456 0.2126 20 1.2305 1064552
0.8874 0.2658 25 1.2288 1334288
0.7271 0.3189 30 1.2471 1604456
0.6848 0.3721 35 1.2268 1869408
0.66 0.4252 40 1.2269 2137928
0.5898 0.4784 45 1.2345 2405736
0.5111 0.5316 50 1.2218 2670688
0.5592 0.5847 55 1.2104 2939792
0.4165 0.6379 60 1.2177 3205680
0.5257 0.6910 65 1.2159 3475424
0.3911 0.7442 70 1.2172 3741984
0.4243 0.7973 75 1.2121 4012288
0.512 0.8505 80 1.2124 4271576
0.473 0.9037 85 1.2070 4541040
0.3554 0.9568 90 1.2051 4811336

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
4
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulatesubsample_iter13_sftsd2

Base model

google/gemma-2-2b
Finetuned
(484)
this model