collapse_gemma-2-9b_hs2_accumulate_iter3_sftsd0

This model is a fine-tuned version of google/gemma-2-9b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.9477
  • Num Input Tokens Seen: 14793756

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 4
  • eval_batch_size: 16
  • seed: 0
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.2335 0
1.3013 0.0175 5 1.1349 259944
1.148 0.0349 10 1.0479 514560
0.902 0.0524 15 1.0036 778996
0.7298 0.0698 20 1.0010 1038964
0.6962 0.0873 25 1.0130 1300792
0.5439 0.1047 30 1.0141 1557828
0.5019 0.1222 35 1.0046 1817320
0.4166 0.1396 40 0.9985 2069988
0.4424 0.1571 45 0.9901 2333472
0.4297 0.1745 50 0.9870 2600464
0.4457 0.1920 55 0.9801 2854620
0.495 0.2094 60 0.9794 3118804
0.4569 0.2269 65 0.9781 3365668
0.3777 0.2444 70 0.9738 3629244
0.3982 0.2618 75 0.9730 3897748
0.4096 0.2793 80 0.9705 4158168
0.3907 0.2967 85 0.9704 4410788
0.4164 0.3142 90 0.9673 4666960
0.4496 0.3316 95 0.9672 4931648
0.337 0.3491 100 0.9659 5191760
0.5405 0.3665 105 0.9639 5456488
0.484 0.3840 110 0.9637 5719168
0.4114 0.4014 115 0.9631 5975456
0.4027 0.4189 120 0.9625 6235256
0.3754 0.4363 125 0.9601 6491744
0.3875 0.4538 130 0.9617 6753820
0.3731 0.4713 135 0.9610 7011036
0.3216 0.4887 140 0.9580 7269372
0.4588 0.5062 145 0.9609 7522300
0.3542 0.5236 150 0.9578 7781528
0.4457 0.5411 155 0.9561 8041692
0.3787 0.5585 160 0.9582 8297540
0.3757 0.5760 165 0.9581 8554200
0.2727 0.5934 170 0.9550 8806200
0.4217 0.6109 175 0.9556 9061392
0.3614 0.6283 180 0.9542 9325600
0.3785 0.6458 185 0.9539 9584028
0.376 0.6632 190 0.9538 9843748
0.3718 0.6807 195 0.9543 10098676
0.3875 0.6982 200 0.9544 10361676
0.4865 0.7156 205 0.9530 10613476
0.3704 0.7331 210 0.9536 10873088
0.3826 0.7505 215 0.9526 11136952
0.4034 0.7680 220 0.9506 11391732
0.4117 0.7854 225 0.9510 11646964
0.4504 0.8029 230 0.9517 11902304
0.3987 0.8203 235 0.9498 12158984
0.3092 0.8378 240 0.9497 12418988
0.4653 0.8552 245 0.9518 12686188
0.3395 0.8727 250 0.9529 12946236
0.4376 0.8901 255 0.9503 13199912
0.3509 0.9076 260 0.9484 13460552
0.4473 0.9251 265 0.9504 13725908
0.3915 0.9425 270 0.9495 13977760
0.3943 0.9600 275 0.9485 14235544
0.3339 0.9774 280 0.9490 14488232
0.3577 0.9949 285 0.9479 14744980

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
11
Safetensors
Model size
9.24B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-9b_hs2_accumulate_iter3_sftsd0

Base model

google/gemma-2-9b
Finetuned
(143)
this model