--- base_model: TII-Frontier-Team/falcon3-3b-instruct datasets: - TII-Frontier-Team/Reasoning_DPO library_name: peft tags: - alignment-handbook - trl - dpo - generated_from_trainer model-index: - name: zephyr-7b-dpo-qlora results: [] --- # zephyr-7b-dpo-qlora This model is a fine-tuned version of [TII-Frontier-Team/PEFT-falcon3b-it-gsm8k](https://huggingface.co/TII-Frontier-Team/PEFT-falcon3b-it-gsm8k) on the TII-Frontier-Team/Reasoning_DPO dataset. It achieves the following results on the evaluation set: - Loss: 0.0286 - Rewards/chosen: -4.7078 - Rewards/rejected: -10.6652 - Rewards/accuracies: 0.9254 - Rewards/margins: 5.9575 - Logps/rejected: -1102.4209 - Logps/chosen: -503.5470 - Logits/rejected: 1.9412 - Logits/chosen: 2.1408 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-06 - train_batch_size: 4 - eval_batch_size: 8 - seed: 42 - distributed_type: multi-GPU - num_devices: 8 - gradient_accumulation_steps: 4 - total_train_batch_size: 128 - total_eval_batch_size: 64 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:| | 0.6914 | 0.0315 | 100 | 0.6912 | 0.0006 | -0.0036 | 0.6340 | 0.0042 | -36.2582 | -32.7125 | -1.6841 | -1.6367 | | 0.6743 | 0.0629 | 200 | 0.6753 | -0.0009 | -0.0462 | 0.6321 | 0.0454 | -40.5232 | -32.8573 | -1.5154 | -1.4649 | | 0.6112 | 0.0944 | 300 | 0.5905 | -0.5010 | -0.8365 | 0.6631 | 0.3356 | -119.5518 | -82.8670 | -0.5166 | -0.4325 | | 0.4477 | 0.1258 | 400 | 0.4026 | -1.9267 | -3.0850 | 0.7201 | 1.1583 | -344.3972 | -225.4428 | -0.5023 | -0.3494 | | 0.3583 | 0.1573 | 500 | 0.3063 | -2.4869 | -4.1367 | 0.7646 | 1.6498 | -449.5698 | -281.4605 | 0.3124 | 0.4717 | | 0.3041 | 0.1887 | 600 | 0.2405 | -2.9070 | -4.9732 | 0.7918 | 2.0662 | -533.2189 | -323.4665 | 0.9644 | 1.1113 | | 0.2487 | 0.2202 | 700 | 0.1964 | -3.4123 | -5.8172 | 0.8209 | 2.4050 | -617.6231 | -373.9985 | 1.1343 | 1.2933 | | 0.218 | 0.2517 | 800 | 0.1547 | -3.6771 | -6.6251 | 0.8336 | 2.9480 | -698.4094 | -400.4795 | 1.5710 | 1.7290 | | 0.1858 | 0.2831 | 900 | 0.1394 | -3.5484 | -6.6808 | 0.8485 | 3.1324 | -703.9799 | -387.6123 | 1.6988 | 1.8631 | | 0.173 | 0.3146 | 1000 | 0.1176 | -3.4824 | -6.7705 | 0.8649 | 3.2881 | -712.9531 | -381.0118 | 1.8190 | 1.9776 | | 0.1494 | 0.3460 | 1100 | 0.0979 | -3.7942 | -7.4529 | 0.8713 | 3.6587 | -781.1857 | -412.1861 | 1.8179 | 1.9865 | | 0.149 | 0.3775 | 1200 | 0.0817 | -4.1856 | -8.2504 | 0.8843 | 4.0648 | -860.9355 | -451.3316 | 1.8715 | 2.0581 | | 0.1143 | 0.4089 | 1300 | 0.0702 | -4.2444 | -8.6154 | 0.8884 | 4.3710 | -897.4431 | -457.2141 | 1.7765 | 1.9770 | | 0.1204 | 0.4404 | 1400 | 0.0642 | -4.1442 | -8.6112 | 0.8966 | 4.4670 | -897.0154 | -447.1863 | 2.1996 | 2.3734 | | 0.1013 | 0.4718 | 1500 | 0.0580 | -4.5031 | -9.1159 | 0.8951 | 4.6128 | -947.4904 | -483.0838 | 1.9514 | 2.1364 | | 0.1011 | 0.5033 | 1600 | 0.0567 | -4.0373 | -8.5779 | 0.9067 | 4.5406 | -893.6846 | -436.5011 | 1.9239 | 2.1103 | | 0.0853 | 0.5348 | 1700 | 0.0482 | -4.3119 | -9.2927 | 0.9067 | 4.9808 | -965.1708 | -463.9637 | 2.0648 | 2.2336 | | 0.0897 | 0.5662 | 1800 | 0.0449 | -4.3018 | -9.4275 | 0.9101 | 5.1257 | -978.6490 | -462.9552 | 1.9037 | 2.0822 | | 0.0717 | 0.5977 | 1900 | 0.0402 | -4.4391 | -9.8395 | 0.9112 | 5.4004 | -1019.8445 | -476.6779 | 2.0003 | 2.1749 | | 0.0487 | 0.6291 | 2000 | 0.0368 | -5.4728 | -11.3180 | 0.9078 | 5.8452 | -1167.6968 | -580.0486 | 1.9355 | 2.1422 | | 0.0683 | 0.6606 | 2100 | 0.0356 | -4.6736 | -10.2835 | 0.9190 | 5.6099 | -1064.2465 | -500.1268 | 2.0206 | 2.2058 | | 0.0514 | 0.6920 | 2200 | 0.0341 | -4.6025 | -10.2228 | 0.9209 | 5.6203 | -1058.1812 | -493.0187 | 1.9362 | 2.1272 | | 0.0623 | 0.7235 | 2300 | 0.0326 | -4.9398 | -10.7061 | 0.9213 | 5.7663 | -1106.5096 | -526.7491 | 1.8240 | 2.0327 | | 0.0693 | 0.7550 | 2400 | 0.0313 | -4.8024 | -10.6310 | 0.9231 | 5.8286 | -1098.9999 | -513.0095 | 1.8580 | 2.0583 | | 0.0543 | 0.7864 | 2500 | 0.0303 | -4.8132 | -10.7352 | 0.9228 | 5.9221 | -1109.4199 | -514.0873 | 1.9534 | 2.1471 | | 0.0555 | 0.8179 | 2600 | 0.0301 | -4.7251 | -10.5626 | 0.9261 | 5.8375 | -1092.1620 | -505.2810 | 1.9398 | 2.1357 | | 0.0646 | 0.8493 | 2700 | 0.0294 | -4.6930 | -10.6307 | 0.9261 | 5.9377 | -1098.9694 | -502.0694 | 2.0003 | 2.1947 | | 0.0546 | 0.8808 | 2800 | 0.0287 | -4.8085 | -10.8169 | 0.9250 | 6.0084 | -1117.5887 | -513.6258 | 1.9596 | 2.1607 | | 0.0702 | 0.9122 | 2900 | 0.0288 | -4.6970 | -10.6904 | 0.9243 | 5.9934 | -1104.9371 | -502.4718 | 1.9696 | 2.1647 | | 0.0623 | 0.9437 | 3000 | 0.0286 | -4.7098 | -10.6743 | 0.9269 | 5.9645 | -1103.3302 | -503.7507 | 1.9440 | 2.1437 | | 0.0593 | 0.9751 | 3100 | 0.0287 | -4.6985 | -10.6531 | 0.9276 | 5.9547 | -1101.2122 | -502.6163 | 1.9469 | 2.1464 | ### Framework versions - PEFT 0.13.0 - Transformers 4.45.1 - Pytorch 2.4.1+cu121 - Datasets 3.0.1 - Tokenizers 0.20.0