collapse_gemma-2-2b_hs2_replace_iter7_sftsd2

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.3794
  • Num Input Tokens Seen: 4746800

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.5554 0.0511 5 1.2763 236632
0.962 0.1022 10 1.2791 482072
0.5465 0.1533 15 1.5351 717336
0.2577 0.2043 20 1.7737 963104
0.1106 0.2554 25 1.9109 1209072
0.0893 0.3065 30 2.1127 1459312
0.042 0.3576 35 2.2649 1704152
0.0359 0.4087 40 2.3544 1951208
0.0263 0.4598 45 2.4350 2199120
0.0284 0.5109 50 2.4259 2442720
0.0274 0.5619 55 2.4235 2682600
0.0269 0.6130 60 2.4021 2921864
0.0234 0.6641 65 2.3916 3165080
0.0359 0.7152 70 2.3877 3404192
0.0227 0.7663 75 2.3636 3652912
0.0239 0.8174 80 2.3615 3906608
0.0224 0.8685 85 2.3580 4151080
0.0226 0.9195 90 2.3604 4401936
0.0293 0.9706 95 2.3700 4644944

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
1
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_replace_iter7_sftsd2

Base model

google/gemma-2-2b
Finetuned
(471)
this model