collapse_gemma-2-2b_hs2_replace_iter5_sftsd1

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.2688
  • Num Input Tokens Seen: 5013656

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 1
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.4559 0.0513 5 1.2725 257864
1.0605 0.1027 10 1.2385 516944
0.6871 0.1540 15 1.4054 773056
0.4263 0.2054 20 1.6020 1039416
0.2328 0.2567 25 1.7950 1289960
0.158 0.3081 30 1.9368 1547184
0.0827 0.3594 35 2.1794 1796888
0.0603 0.4108 40 2.1921 2057712
0.0448 0.4621 45 2.3177 2314536
0.0473 0.5135 50 2.3247 2574664
0.033 0.5648 55 2.3309 2830688
0.0254 0.6162 60 2.3438 3090024
0.0281 0.6675 65 2.3396 3351576
0.0257 0.7189 70 2.3013 3611544
0.0235 0.7702 75 2.2895 3870096
0.029 0.8216 80 2.3117 4131664
0.0278 0.8729 85 2.2845 4389008
0.0309 0.9243 90 2.2608 4654528
0.0246 0.9756 95 2.2600 4907632

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
4
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_replace_iter5_sftsd1

Base model

google/gemma-2-2b
Finetuned
(484)
this model