zephyr-7b-dpo-qlora
This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-qlora on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:
- Loss: 0.4788
- Rewards/chosen: -2.6215
- Rewards/rejected: -3.9187
- Rewards/accuracies: 0.7465
- Rewards/margins: 1.2972
- Logps/rejected: -636.4379
- Logps/chosen: -526.7527
- Logits/rejected: -1.0290
- Logits/chosen: -1.1652
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 4
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- gradient_accumulation_steps: 4
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.6807 | 0.0262 | 100 | 0.6809 | 0.0514 | 0.0256 | 0.6555 | 0.0258 | -242.0131 | -259.4604 | -2.0551 | -2.1482 |
0.6438 | 0.0523 | 200 | 0.6356 | -0.1881 | -0.3389 | 0.6760 | 0.1508 | -278.4615 | -283.4154 | -2.0113 | -2.1000 |
0.6073 | 0.0785 | 300 | 0.6054 | -0.6866 | -0.9744 | 0.6815 | 0.2878 | -342.0091 | -333.2583 | -1.9949 | -2.0782 |
0.5956 | 0.1047 | 400 | 0.5824 | -1.4485 | -1.9599 | 0.6830 | 0.5114 | -440.5653 | -409.4522 | -1.5844 | -1.6758 |
0.5643 | 0.1309 | 500 | 0.5726 | -1.1458 | -1.7589 | 0.6915 | 0.6131 | -420.4636 | -379.1804 | -1.5624 | -1.6658 |
0.5373 | 0.1570 | 600 | 0.5631 | -1.1286 | -1.8164 | 0.7030 | 0.6878 | -426.2121 | -377.4605 | -1.6945 | -1.7955 |
0.5394 | 0.1832 | 700 | 0.5474 | -2.2700 | -3.0663 | 0.7040 | 0.7963 | -551.1992 | -491.6012 | -1.1628 | -1.2719 |
0.4983 | 0.2094 | 800 | 0.5323 | -1.5616 | -2.2966 | 0.7225 | 0.7349 | -474.2269 | -420.7654 | -1.5104 | -1.5996 |
0.4763 | 0.2355 | 900 | 0.5386 | -1.6130 | -2.4122 | 0.7160 | 0.7992 | -485.7890 | -425.9030 | -1.4156 | -1.4989 |
0.5266 | 0.2617 | 1000 | 0.5234 | -2.1788 | -3.0546 | 0.7280 | 0.8758 | -550.0311 | -482.4831 | -1.2043 | -1.3050 |
0.59 | 0.2879 | 1100 | 0.5278 | -1.6937 | -2.3427 | 0.7300 | 0.6490 | -478.8385 | -433.9710 | -0.9899 | -1.1100 |
0.5724 | 0.3141 | 1200 | 0.5071 | -1.5548 | -2.4072 | 0.7380 | 0.8523 | -485.2895 | -420.0863 | -1.1349 | -1.2473 |
0.5457 | 0.3402 | 1300 | 0.5013 | -1.7544 | -2.6264 | 0.7435 | 0.8721 | -507.2138 | -440.0385 | -1.2424 | -1.3403 |
0.5423 | 0.3664 | 1400 | 0.5132 | -1.6381 | -2.6114 | 0.7210 | 0.9733 | -505.7077 | -428.4097 | -1.5063 | -1.5869 |
0.4492 | 0.3926 | 1500 | 0.5122 | -1.5882 | -2.5891 | 0.7260 | 1.0010 | -503.4828 | -423.4175 | -1.4972 | -1.5950 |
0.5491 | 0.4187 | 1600 | 0.4956 | -1.6959 | -2.7056 | 0.7395 | 1.0098 | -515.1351 | -434.1913 | -1.1293 | -1.2525 |
0.5408 | 0.4449 | 1700 | 0.5111 | -3.0361 | -4.2392 | 0.7305 | 1.2030 | -668.4869 | -568.2142 | -1.0520 | -1.1774 |
0.4705 | 0.4711 | 1800 | 0.4949 | -2.1236 | -3.1894 | 0.7435 | 1.0658 | -563.5121 | -476.9663 | -1.3479 | -1.4508 |
0.4447 | 0.4973 | 1900 | 0.4984 | -2.0350 | -3.1505 | 0.7420 | 1.1155 | -559.6229 | -468.1011 | -1.1711 | -1.2951 |
0.4561 | 0.5234 | 2000 | 0.4929 | -1.9668 | -2.9588 | 0.7420 | 0.9919 | -540.4462 | -461.2839 | -1.3557 | -1.4696 |
0.5068 | 0.5496 | 2100 | 0.4969 | -3.1452 | -4.3633 | 0.7350 | 1.2180 | -680.8954 | -579.1231 | -1.1150 | -1.2426 |
0.4839 | 0.5758 | 2200 | 0.4927 | -2.3797 | -3.4376 | 0.7405 | 1.0579 | -588.3315 | -502.5681 | -1.2706 | -1.3886 |
0.4729 | 0.6019 | 2300 | 0.4924 | -2.8461 | -4.1210 | 0.7405 | 1.2749 | -656.6667 | -549.2124 | -1.0868 | -1.2145 |
0.4501 | 0.6281 | 2400 | 0.4900 | -2.9743 | -4.2366 | 0.7430 | 1.2623 | -668.2346 | -562.0333 | -0.9978 | -1.1257 |
0.4982 | 0.6543 | 2500 | 0.4872 | -2.4585 | -3.6758 | 0.7420 | 1.2173 | -612.1486 | -510.4511 | -1.0532 | -1.1862 |
0.4649 | 0.6805 | 2600 | 0.4881 | -2.5759 | -3.8831 | 0.7450 | 1.3072 | -632.8793 | -522.1908 | -1.0793 | -1.2115 |
0.556 | 0.7066 | 2700 | 0.4841 | -2.3432 | -3.5113 | 0.7460 | 1.1680 | -595.6959 | -498.9265 | -1.1004 | -1.2295 |
0.4617 | 0.7328 | 2800 | 0.4832 | -2.3495 | -3.6183 | 0.7460 | 1.2689 | -606.4033 | -499.5496 | -1.0627 | -1.1960 |
0.4916 | 0.7590 | 2900 | 0.4800 | -2.6711 | -3.9165 | 0.7455 | 1.2454 | -636.2195 | -531.7142 | -1.0032 | -1.1418 |
0.4708 | 0.7851 | 3000 | 0.4797 | -2.6166 | -3.7883 | 0.7475 | 1.1717 | -623.4008 | -526.2621 | -0.9962 | -1.1355 |
0.4804 | 0.8113 | 3100 | 0.4807 | -2.8224 | -4.1220 | 0.7475 | 1.2996 | -656.7728 | -546.8435 | -0.9953 | -1.1341 |
0.4866 | 0.8375 | 3200 | 0.4777 | -2.5496 | -3.7894 | 0.7475 | 1.2398 | -623.5103 | -519.5614 | -1.0276 | -1.1641 |
0.4967 | 0.8636 | 3300 | 0.4786 | -2.5578 | -3.8108 | 0.7480 | 1.2530 | -625.6535 | -520.3804 | -1.0241 | -1.1608 |
0.4272 | 0.8898 | 3400 | 0.4797 | -2.7223 | -4.0287 | 0.7460 | 1.3065 | -647.4435 | -536.8282 | -1.0071 | -1.1445 |
0.5272 | 0.9160 | 3500 | 0.4797 | -2.7144 | -4.0320 | 0.7470 | 1.3176 | -647.7730 | -536.0449 | -1.0233 | -1.1601 |
0.4441 | 0.9422 | 3600 | 0.4790 | -2.6459 | -3.9513 | 0.7470 | 1.3054 | -639.7043 | -529.1944 | -1.0278 | -1.1641 |
0.4823 | 0.9683 | 3700 | 0.4789 | -2.6279 | -3.9262 | 0.7480 | 1.2982 | -637.1880 | -527.3952 | -1.0329 | -1.1687 |
0.4996 | 0.9945 | 3800 | 0.4788 | -2.6215 | -3.9183 | 0.7475 | 1.2968 | -636.4029 | -526.7561 | -1.0296 | -1.1658 |
Framework versions
- PEFT 0.13.2
- Transformers 4.45.2
- Pytorch 2.1.2+cu121
- Datasets 3.0.1
- Tokenizers 0.20.1
- Downloads last month
- 10
Model tree for guoqiang-x/zephyr-7b-dpo-qlora
Base model
mistralai/Mistral-7B-v0.1