collapse_gemma-2-2b_hs2_accumulatesubsample_iter7_sftsd1

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1675
  • Num Input Tokens Seen: 5022496

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 1
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.4117 0.0532 5 1.2712 266592
1.1724 0.1063 10 1.1918 533048
1.0606 0.1595 15 1.1870 799288
0.8876 0.2126 20 1.2049 1066760
0.7677 0.2658 25 1.2150 1336544
0.823 0.3189 30 1.2520 1607968
0.6771 0.3721 35 1.2333 1874784
0.6136 0.4252 40 1.2067 2139928
0.6083 0.4784 45 1.2110 2411200
0.6399 0.5316 50 1.1935 2679224
0.5353 0.5847 55 1.1854 2944064
0.5082 0.6379 60 1.1890 3209088
0.4659 0.6910 65 1.1827 3473936
0.5292 0.7442 70 1.1786 3744800
0.4468 0.7973 75 1.1750 4009560
0.453 0.8505 80 1.1796 4274632
0.4064 0.9037 85 1.1718 4536408
0.4862 0.9568 90 1.1720 4804824

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
1
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulatesubsample_iter7_sftsd1

Base model

google/gemma-2-2b
Finetuned
(471)
this model