collapse_gemma-2-9b_hs2_accumulate_iter3_sftsd0
This model is a fine-tuned version of google/gemma-2-9b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.9477
- Num Input Tokens Seen: 14793756
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 4
- eval_batch_size: 16
- seed: 0
- gradient_accumulation_steps: 32
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.2335 | 0 |
1.3013 | 0.0175 | 5 | 1.1349 | 259944 |
1.148 | 0.0349 | 10 | 1.0479 | 514560 |
0.902 | 0.0524 | 15 | 1.0036 | 778996 |
0.7298 | 0.0698 | 20 | 1.0010 | 1038964 |
0.6962 | 0.0873 | 25 | 1.0130 | 1300792 |
0.5439 | 0.1047 | 30 | 1.0141 | 1557828 |
0.5019 | 0.1222 | 35 | 1.0046 | 1817320 |
0.4166 | 0.1396 | 40 | 0.9985 | 2069988 |
0.4424 | 0.1571 | 45 | 0.9901 | 2333472 |
0.4297 | 0.1745 | 50 | 0.9870 | 2600464 |
0.4457 | 0.1920 | 55 | 0.9801 | 2854620 |
0.495 | 0.2094 | 60 | 0.9794 | 3118804 |
0.4569 | 0.2269 | 65 | 0.9781 | 3365668 |
0.3777 | 0.2444 | 70 | 0.9738 | 3629244 |
0.3982 | 0.2618 | 75 | 0.9730 | 3897748 |
0.4096 | 0.2793 | 80 | 0.9705 | 4158168 |
0.3907 | 0.2967 | 85 | 0.9704 | 4410788 |
0.4164 | 0.3142 | 90 | 0.9673 | 4666960 |
0.4496 | 0.3316 | 95 | 0.9672 | 4931648 |
0.337 | 0.3491 | 100 | 0.9659 | 5191760 |
0.5405 | 0.3665 | 105 | 0.9639 | 5456488 |
0.484 | 0.3840 | 110 | 0.9637 | 5719168 |
0.4114 | 0.4014 | 115 | 0.9631 | 5975456 |
0.4027 | 0.4189 | 120 | 0.9625 | 6235256 |
0.3754 | 0.4363 | 125 | 0.9601 | 6491744 |
0.3875 | 0.4538 | 130 | 0.9617 | 6753820 |
0.3731 | 0.4713 | 135 | 0.9610 | 7011036 |
0.3216 | 0.4887 | 140 | 0.9580 | 7269372 |
0.4588 | 0.5062 | 145 | 0.9609 | 7522300 |
0.3542 | 0.5236 | 150 | 0.9578 | 7781528 |
0.4457 | 0.5411 | 155 | 0.9561 | 8041692 |
0.3787 | 0.5585 | 160 | 0.9582 | 8297540 |
0.3757 | 0.5760 | 165 | 0.9581 | 8554200 |
0.2727 | 0.5934 | 170 | 0.9550 | 8806200 |
0.4217 | 0.6109 | 175 | 0.9556 | 9061392 |
0.3614 | 0.6283 | 180 | 0.9542 | 9325600 |
0.3785 | 0.6458 | 185 | 0.9539 | 9584028 |
0.376 | 0.6632 | 190 | 0.9538 | 9843748 |
0.3718 | 0.6807 | 195 | 0.9543 | 10098676 |
0.3875 | 0.6982 | 200 | 0.9544 | 10361676 |
0.4865 | 0.7156 | 205 | 0.9530 | 10613476 |
0.3704 | 0.7331 | 210 | 0.9536 | 10873088 |
0.3826 | 0.7505 | 215 | 0.9526 | 11136952 |
0.4034 | 0.7680 | 220 | 0.9506 | 11391732 |
0.4117 | 0.7854 | 225 | 0.9510 | 11646964 |
0.4504 | 0.8029 | 230 | 0.9517 | 11902304 |
0.3987 | 0.8203 | 235 | 0.9498 | 12158984 |
0.3092 | 0.8378 | 240 | 0.9497 | 12418988 |
0.4653 | 0.8552 | 245 | 0.9518 | 12686188 |
0.3395 | 0.8727 | 250 | 0.9529 | 12946236 |
0.4376 | 0.8901 | 255 | 0.9503 | 13199912 |
0.3509 | 0.9076 | 260 | 0.9484 | 13460552 |
0.4473 | 0.9251 | 265 | 0.9504 | 13725908 |
0.3915 | 0.9425 | 270 | 0.9495 | 13977760 |
0.3943 | 0.9600 | 275 | 0.9485 | 14235544 |
0.3339 | 0.9774 | 280 | 0.9490 | 14488232 |
0.3577 | 0.9949 | 285 | 0.9479 | 14744980 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 11
Model tree for RylanSchaeffer/collapse_gemma-2-9b_hs2_accumulate_iter3_sftsd0
Base model
google/gemma-2-9b