--- license: gemma base_model: google/gemma-2-9b tags: - trl - sft - generated_from_trainer model-index: - name: collapse_gemma-2-9b_hs2_accumulate_iter3_sftsd1 results: [] --- # collapse_gemma-2-9b_hs2_accumulate_iter3_sftsd1 This model is a fine-tuned version of [google/gemma-2-9b](https://huggingface.co/google/gemma-2-9b) on an unknown dataset. It achieves the following results on the evaluation set: - Loss: 0.9448 - Num Input Tokens Seen: 14395896 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 8e-06 - train_batch_size: 4 - eval_batch_size: 16 - seed: 1 - gradient_accumulation_steps: 32 - total_train_batch_size: 128 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: constant_with_warmup - lr_scheduler_warmup_ratio: 0.05 - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen | |:-------------:|:------:|:----:|:---------------:|:-----------------:| | No log | 0 | 0 | 1.2335 | 0 | | 1.2638 | 0.0178 | 5 | 1.1350 | 263564 | | 1.088 | 0.0356 | 10 | 1.0496 | 522096 | | 0.9278 | 0.0534 | 15 | 1.0059 | 770488 | | 0.7003 | 0.0712 | 20 | 1.0038 | 1030136 | | 0.6094 | 0.0889 | 25 | 1.0117 | 1293096 | | 0.5915 | 0.1067 | 30 | 1.0084 | 1544952 | | 0.571 | 0.1245 | 35 | 1.0023 | 1798880 | | 0.4553 | 0.1423 | 40 | 1.0002 | 2052424 | | 0.4776 | 0.1601 | 45 | 0.9951 | 2308216 | | 0.4561 | 0.1779 | 50 | 0.9884 | 2565080 | | 0.4392 | 0.1957 | 55 | 0.9841 | 2825996 | | 0.4753 | 0.2135 | 60 | 0.9797 | 3082260 | | 0.4597 | 0.2313 | 65 | 0.9759 | 3328388 | | 0.436 | 0.2491 | 70 | 0.9738 | 3584552 | | 0.3907 | 0.2668 | 75 | 0.9703 | 3839180 | | 0.4001 | 0.2846 | 80 | 0.9676 | 4100568 | | 0.4112 | 0.3024 | 85 | 0.9671 | 4356852 | | 0.4249 | 0.3202 | 90 | 0.9659 | 4610688 | | 0.3945 | 0.3380 | 95 | 0.9654 | 4859752 | | 0.5615 | 0.3558 | 100 | 0.9627 | 5108284 | | 0.3528 | 0.3736 | 105 | 0.9619 | 5363428 | | 0.3511 | 0.3914 | 110 | 0.9629 | 5623372 | | 0.3744 | 0.4092 | 115 | 0.9600 | 5876016 | | 0.4473 | 0.4270 | 120 | 0.9598 | 6139008 | | 0.465 | 0.4447 | 125 | 0.9595 | 6392720 | | 0.4511 | 0.4625 | 130 | 0.9568 | 6655704 | | 0.3273 | 0.4803 | 135 | 0.9570 | 6909620 | | 0.3689 | 0.4981 | 140 | 0.9575 | 7163740 | | 0.3782 | 0.5159 | 145 | 0.9551 | 7424140 | | 0.4371 | 0.5337 | 150 | 0.9541 | 7682936 | | 0.3295 | 0.5515 | 155 | 0.9543 | 7939780 | | 0.3631 | 0.5693 | 160 | 0.9533 | 8196216 | | 0.4747 | 0.5871 | 165 | 0.9532 | 8457568 | | 0.4171 | 0.6048 | 170 | 0.9545 | 8708980 | | 0.4043 | 0.6226 | 175 | 0.9535 | 8963244 | | 0.3966 | 0.6404 | 180 | 0.9523 | 9216124 | | 0.487 | 0.6582 | 185 | 0.9520 | 9470216 | | 0.4243 | 0.6760 | 190 | 0.9523 | 9726172 | | 0.338 | 0.6938 | 195 | 0.9505 | 9978316 | | 0.3794 | 0.7116 | 200 | 0.9510 | 10237320 | | 0.4474 | 0.7294 | 205 | 0.9515 | 10498692 | | 0.498 | 0.7472 | 210 | 0.9510 | 10755164 | | 0.3557 | 0.7650 | 215 | 0.9505 | 11013492 | | 0.3772 | 0.7827 | 220 | 0.9503 | 11263256 | | 0.4487 | 0.8005 | 225 | 0.9509 | 11524460 | | 0.3492 | 0.8183 | 230 | 0.9481 | 11776848 | | 0.4046 | 0.8361 | 235 | 0.9483 | 12034428 | | 0.3995 | 0.8539 | 240 | 0.9484 | 12301540 | | 0.345 | 0.8717 | 245 | 0.9485 | 12558184 | | 0.3618 | 0.8895 | 250 | 0.9476 | 12818680 | | 0.286 | 0.9073 | 255 | 0.9476 | 13077536 | | 0.368 | 0.9251 | 260 | 0.9487 | 13332544 | | 0.3742 | 0.9429 | 265 | 0.9456 | 13585628 | | 0.4091 | 0.9606 | 270 | 0.9465 | 13838300 | | 0.3315 | 0.9784 | 275 | 0.9469 | 14090880 | | 0.3664 | 0.9962 | 280 | 0.9449 | 14344624 | ### Framework versions - Transformers 4.44.0 - Pytorch 2.4.0+cu121 - Datasets 2.20.0 - Tokenizers 0.19.1