collapse_gemma-2-2b_hs2_replace_iter7_sftsd0

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.4308
  • Num Input Tokens Seen: 4841952

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 0
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.5346 0.0516 5 1.2786 260208
0.9281 0.1032 10 1.2957 512616
0.5962 0.1547 15 1.5282 766320
0.2905 0.2063 20 1.7105 1018352
0.2044 0.2579 25 1.9340 1270808
0.1128 0.3095 30 2.0841 1527056
0.0634 0.3611 35 2.2219 1778088
0.042 0.4126 40 2.3269 2022528
0.0308 0.4642 45 2.3926 2280096
0.0302 0.5158 50 2.4223 2535736
0.0273 0.5674 55 2.4248 2778432
0.0238 0.6190 60 2.4126 3034168
0.0253 0.6705 65 2.4444 3284112
0.0236 0.7221 70 2.4540 3539984
0.0261 0.7737 75 2.4524 3791064
0.0253 0.8253 80 2.4419 4033320
0.0264 0.8769 85 2.4440 4285000
0.0253 0.9284 90 2.4517 4543120
0.0253 0.9800 95 2.4359 4791856

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
1
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_replace_iter7_sftsd0

Base model

google/gemma-2-2b
Finetuned
(471)
this model