collapse_gemma-2-2b_hs2_replace_iter6_sftsd2

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.3509
  • Num Input Tokens Seen: 4891248

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.6111 0.0511 5 1.2746 256472
0.9464 0.1022 10 1.2781 505824
0.5045 0.1534 15 1.5132 755440
0.2581 0.2045 20 1.7434 1005848
0.1646 0.2556 25 1.9292 1255320
0.0521 0.3067 30 2.1045 1503448
0.0656 0.3578 35 2.2305 1753944
0.0534 0.4089 40 2.3242 2008656
0.0296 0.4601 45 2.3765 2257912
0.0334 0.5112 50 2.3208 2511848
0.0325 0.5623 55 2.2550 2768424
0.0526 0.6134 60 2.2720 3026208
0.0252 0.6645 65 2.2846 3281624
0.0247 0.7157 70 2.2932 3536472
0.0241 0.7668 75 2.3163 3787728
0.0238 0.8179 80 2.3276 4041280
0.0228 0.8690 85 2.3375 4291440
0.0234 0.9201 90 2.3306 4542072
0.0243 0.9712 95 2.3427 4793768

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
6
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_replace_iter6_sftsd2

Base model

google/gemma-2-2b
Finetuned
(484)
this model