Edit model card

collapse_gemma-2-27b_hs2_replace_iter3_sftsd1

This model is a fine-tuned version of google/gemma-2-27b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.5023
  • Num Input Tokens Seen: 4234420

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 4
  • eval_batch_size: 16
  • seed: 1
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.1282 0
3.7147 0.0561 5 1.0496 244372
3.235 0.1122 10 1.1099 481484
2.4971 0.1683 15 1.1474 719068
1.9063 0.2244 20 1.2046 947552
1.4217 0.2805 25 1.2617 1186008
1.0419 0.3366 30 1.2732 1423708
0.5457 0.3927 35 1.2704 1663424
0.5577 0.4488 40 1.2656 1897532
0.4695 0.5049 45 1.2629 2141400
0.4864 0.5610 50 1.2747 2376896
0.4432 0.6171 55 1.2721 2618200
0.3554 0.6732 60 1.3097 2863148
0.3316 0.7293 65 1.3176 3094748
0.242 0.7854 70 1.3384 3336248
0.2209 0.8415 75 1.3945 3565560
0.248 0.8976 80 1.4152 3799244
0.2728 0.9537 85 1.4084 4041904

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
0
Safetensors
Model size
27.2B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-27b_hs2_replace_iter3_sftsd1

Base model

google/gemma-2-27b
Finetuned
(25)
this model