collapse_gemma-2-2b_hs2_accumulatesubsample_iter11_sftsd0

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1919
  • Num Input Tokens Seen: 4998928

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 0
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.3516 0.0539 5 1.2769 268176
1.1484 0.1077 10 1.2134 539128
0.9886 0.1616 15 1.2129 816696
0.8987 0.2155 20 1.2197 1085552
0.76 0.2694 25 1.2295 1359368
0.7064 0.3232 30 1.2342 1633752
0.7212 0.3771 35 1.2166 1907824
0.6333 0.4310 40 1.2243 2178048
0.6637 0.4848 45 1.2205 2450344
0.5582 0.5387 50 1.2237 2715928
0.5408 0.5926 55 1.2263 2988808
0.4935 0.6465 60 1.1999 3260920
0.5121 0.7003 65 1.2196 3530496
0.5136 0.7542 70 1.2042 3804800
0.4048 0.8081 75 1.2149 4080712
0.4924 0.8620 80 1.2057 4350992
0.353 0.9158 85 1.2012 4616600
0.5064 0.9697 90 1.1996 4895192

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
6
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulatesubsample_iter11_sftsd0

Base model

google/gemma-2-2b
Finetuned
(484)
this model