collapse_gemma-2-2b_hs2_accumulatesubsample_iter8_sftsd0

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1876
  • Num Input Tokens Seen: 5050984

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 0
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.3991 0.0543 5 1.2725 268768
1.1703 0.1086 10 1.1974 544736
1.0865 0.1629 15 1.1898 814424
1.0812 0.2172 20 1.1993 1093688
0.8983 0.2716 25 1.2050 1373160
0.8093 0.3259 30 1.2215 1647872
0.734 0.3802 35 1.2129 1921464
0.6783 0.4345 40 1.2141 2206248
0.5858 0.4888 45 1.2226 2476264
0.6223 0.5431 50 1.2036 2753528
0.7186 0.5974 55 1.1927 3034280
0.452 0.6517 60 1.2088 3302232
0.5381 0.7060 65 1.1925 3575192
0.6065 0.7604 70 1.1956 3848736
0.5219 0.8147 75 1.1899 4125440
0.4986 0.8690 80 1.1895 4392672
0.4997 0.9233 85 1.1895 4661944
0.5353 0.9776 90 1.1928 4941648

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
1
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulatesubsample_iter8_sftsd0

Base model

google/gemma-2-2b
Finetuned
(484)
this model