collapse_gemma-2-2b_hs2_accumulatesubsample_iter14_sftsd1

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.2052
  • Num Input Tokens Seen: 4939392

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 1
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.395 0.0528 5 1.2762 263616
1.025 0.1057 10 1.2032 522192
0.9043 0.1585 15 1.2117 781320
0.7869 0.2114 20 1.2530 1049592
0.7435 0.2642 25 1.2754 1306912
0.613 0.3170 30 1.2796 1574208
0.5984 0.3699 35 1.2686 1838000
0.5522 0.4227 40 1.2501 2102768
0.3169 0.4756 45 1.2364 2356712
0.4495 0.5284 50 1.2186 2617488
0.3906 0.5812 55 1.2323 2878520
0.3294 0.6341 60 1.2076 3138624
0.4019 0.6869 65 1.2202 3399776
0.3896 0.7398 70 1.2076 3658416
0.3273 0.7926 75 1.2138 3928424
0.3961 0.8454 80 1.2004 4193672
0.3151 0.8983 85 1.2016 4458168
0.3865 0.9511 90 1.1975 4728256

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
3
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulatesubsample_iter14_sftsd1

Base model

google/gemma-2-2b
Finetuned
(471)
this model