Edit model card

collapse_gemma-2-27b_hs2_replace_iter3_sftsd2

This model is a fine-tuned version of google/gemma-2-27b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.2906
  • Num Input Tokens Seen: 3849248

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 4
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.1282 0
3.9495 0.0656 5 1.0675 257040
3.6648 0.1311 10 1.1421 509700
3.4127 0.1967 15 1.1791 758008
3.3226 0.2623 20 1.2243 1018792
3.0561 0.3279 25 1.2610 1268784
2.9523 0.3934 30 1.2649 1519000
2.7744 0.4590 35 1.2769 1772404
2.7413 0.5246 40 1.2763 2028800
2.554 0.5902 45 1.2870 2281292
2.4988 0.6557 50 1.2886 2529200
2.6288 0.7213 55 1.2899 2791768
2.4897 0.7869 60 1.2956 3044012
2.6199 0.8525 65 1.2915 3299772
2.4062 0.9180 70 1.2969 3550872
2.2421 0.9836 75 1.2894 3799436

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
0
Safetensors
Model size
27.2B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-27b_hs2_replace_iter3_sftsd2

Base model

google/gemma-2-27b
Finetuned
(25)
this model