collapse_gemma-2-27b_hs2_accumulate_iter3_sftsd1

This model is a fine-tuned version of google/gemma-2-27b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.9280
  • Num Input Tokens Seen: 13412700

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 4
  • eval_batch_size: 16
  • seed: 1
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.1282 0
2.1585 0.0187 5 1.0516 253472
2.0366 0.0374 10 0.9878 506396
2.2853 0.0562 15 0.9800 760944
1.9353 0.0749 20 0.9748 1012816
1.7788 0.0936 25 0.9765 1258660
1.5677 0.1123 30 0.9865 1505980
1.6266 0.1310 35 0.9797 1748944
1.3893 0.1498 40 0.9770 1996076
1.3214 0.1685 45 0.9758 2249964
1.2104 0.1872 50 0.9732 2502428
1.1943 0.2059 55 0.9673 2758156
0.9618 0.2246 60 0.9648 3002952
0.9917 0.2434 65 0.9608 3250420
0.9458 0.2621 70 0.9592 3498588
0.8799 0.2808 75 0.9541 3753220
0.9288 0.2995 80 0.9547 4005744
0.9042 0.3182 85 0.9524 4251648
0.7466 0.3370 90 0.9507 4504748
0.802 0.3557 95 0.9492 4759604
0.786 0.3744 100 0.9468 5010224
0.8059 0.3931 105 0.9463 5261388
0.7014 0.4118 110 0.9448 5508984
0.7977 0.4306 115 0.9438 5767344
0.9226 0.4493 120 0.9425 6015220
0.9092 0.4680 125 0.9414 6270096
0.692 0.4867 130 0.9401 6522928
0.7488 0.5054 135 0.9394 6774308
0.6813 0.5242 140 0.9378 7026956
0.9565 0.5429 145 0.9353 7281764
0.7867 0.5616 150 0.9364 7535708
0.6354 0.5803 155 0.9373 7783224
0.8341 0.5990 160 0.9340 8026812
0.834 0.6178 165 0.9358 8276260
0.7364 0.6365 170 0.9338 8529636
0.7822 0.6552 175 0.9329 8787372
0.8144 0.6739 180 0.9337 9033612
0.7588 0.6926 185 0.9321 9283952
0.6757 0.7114 190 0.9320 9528272
0.5925 0.7301 195 0.9327 9775216
0.6711 0.7488 200 0.9321 10031428
0.7888 0.7675 205 0.9301 10287112
0.7551 0.7862 210 0.9322 10539552
0.7367 0.8050 215 0.9328 10786728
0.6682 0.8237 220 0.9318 11033040
0.7802 0.8424 225 0.9310 11281864
0.7423 0.8611 230 0.9317 11537232
0.8502 0.8798 235 0.9309 11791856
0.7691 0.8986 240 0.9283 12041012
0.7173 0.9173 245 0.9318 12291188
0.7158 0.9360 250 0.9296 12542864
0.7733 0.9547 255 0.9307 12794508
0.6864 0.9734 260 0.9298 13055348
0.6458 0.9922 265 0.9288 13306708

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
9
Safetensors
Model size
27.2B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-27b_hs2_accumulate_iter3_sftsd1

Base model

google/gemma-2-27b
Finetuned
(33)
this model