eurus-dpop-qlora-uf-5e-7
This model is a fine-tuned version of openbmb/Eurus-7b-sft on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:
- Loss: 0.6859
- Positive Losses: 0.2505
- Dpo Losses: 0.6519
- Rewards/chosen: 0.1680
- Rewards/rejected: 0.0763
- Rewards/accuracies: 0.6850
- Rewards/margins: 0.0917
- Rewards/margins Max: 0.3593
- Rewards/margins Min: -0.1191
- Rewards/margins Std: 0.1608
- Logps/rejected: -249.9021
- Logps/chosen: -258.0758
- Logits/rejected: -2.0757
- Logits/chosen: -2.1855
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 4
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 2
- gradient_accumulation_steps: 2
- total_train_batch_size: 16
- total_eval_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Positive Losses | Dpo Losses | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Rewards/margins Max | Rewards/margins Min | Rewards/margins Std | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0.6931 | 0.03 | 100 | 0.6927 | 0.0087 | 0.6918 | 0.0151 | 0.0123 | 0.5400 | 0.0028 | 0.0252 | -0.0152 | 0.0132 | -256.3040 | -273.3622 | -2.1875 | -2.3101 |
0.6865 | 0.05 | 200 | 0.6893 | 0.0127 | 0.6876 | 0.0411 | 0.0296 | 0.6190 | 0.0114 | 0.0652 | -0.0319 | 0.0318 | -254.5748 | -270.7704 | -2.1895 | -2.3112 |
0.6858 | 0.08 | 300 | 0.6810 | 0.0282 | 0.6762 | 0.1236 | 0.0879 | 0.6770 | 0.0357 | 0.1625 | -0.0694 | 0.0756 | -248.7482 | -262.5165 | -2.1683 | -2.2892 |
0.6921 | 0.1 | 400 | 0.6993 | 0.3136 | 0.6660 | 0.1158 | 0.0575 | 0.7100 | 0.0583 | 0.2335 | -0.0929 | 0.1089 | -251.7885 | -263.2999 | -2.1484 | -2.2675 |
0.7032 | 0.13 | 500 | 0.6833 | 0.1186 | 0.6693 | 0.1493 | 0.0978 | 0.6690 | 0.0515 | 0.2281 | -0.0970 | 0.1084 | -247.7569 | -259.9434 | -2.1268 | -2.2450 |
0.6709 | 0.16 | 600 | 0.6814 | 0.1022 | 0.6685 | 0.1465 | 0.0936 | 0.6730 | 0.0529 | 0.2253 | -0.0901 | 0.1063 | -248.1758 | -260.2231 | -2.1172 | -2.2344 |
0.6987 | 0.18 | 700 | 0.6806 | 0.1160 | 0.6673 | 0.1507 | 0.0949 | 0.6790 | 0.0558 | 0.2360 | -0.0982 | 0.1127 | -248.0485 | -259.8086 | -2.0848 | -2.2010 |
0.6852 | 0.21 | 800 | 0.6832 | 0.1528 | 0.6649 | 0.1516 | 0.0903 | 0.6790 | 0.0614 | 0.2546 | -0.1056 | 0.1215 | -248.5094 | -259.7122 | -2.1049 | -2.2238 |
0.6876 | 0.24 | 900 | 0.6822 | 0.1465 | 0.6644 | 0.1562 | 0.0937 | 0.6830 | 0.0625 | 0.2574 | -0.1067 | 0.1231 | -248.1658 | -259.2580 | -2.0901 | -2.2109 |
0.698 | 0.26 | 1000 | 0.6801 | 0.1039 | 0.6644 | 0.1618 | 0.0992 | 0.6730 | 0.0627 | 0.2654 | -0.1054 | 0.1244 | -247.6179 | -258.6924 | -2.0867 | -2.2065 |
0.7006 | 0.29 | 1100 | 0.6877 | 0.1930 | 0.6611 | 0.1586 | 0.0883 | 0.6860 | 0.0703 | 0.2899 | -0.1088 | 0.1345 | -248.7062 | -259.0172 | -2.0660 | -2.1846 |
0.6737 | 0.31 | 1200 | 0.6805 | 0.1326 | 0.6622 | 0.1634 | 0.0958 | 0.6820 | 0.0675 | 0.2808 | -0.1050 | 0.1300 | -247.9524 | -258.5385 | -2.0612 | -2.1789 |
0.7092 | 0.34 | 1300 | 0.6956 | 0.3335 | 0.6572 | 0.1397 | 0.0607 | 0.6950 | 0.0790 | 0.3144 | -0.1128 | 0.1445 | -251.4707 | -260.9076 | -2.0676 | -2.1846 |
0.717 | 0.37 | 1400 | 0.6827 | 0.1951 | 0.6599 | 0.1642 | 0.0910 | 0.6880 | 0.0733 | 0.3075 | -0.1193 | 0.1424 | -248.4406 | -258.4545 | -2.0552 | -2.1724 |
0.69 | 0.39 | 1500 | 0.6954 | 0.3337 | 0.6561 | 0.1557 | 0.0736 | 0.6940 | 0.0821 | 0.3383 | -0.1185 | 0.1527 | -250.1766 | -259.3101 | -2.0743 | -2.1905 |
0.7032 | 0.42 | 1600 | 0.6884 | 0.2603 | 0.6583 | 0.1568 | 0.0798 | 0.6800 | 0.0769 | 0.3197 | -0.1150 | 0.1463 | -249.5549 | -259.2010 | -2.0497 | -2.1647 |
0.6745 | 0.44 | 1700 | 0.6896 | 0.2743 | 0.6554 | 0.1534 | 0.0702 | 0.6960 | 0.0833 | 0.3295 | -0.1172 | 0.1496 | -250.5217 | -259.5353 | -2.0491 | -2.1657 |
0.738 | 0.47 | 1800 | 0.6787 | 0.1378 | 0.6589 | 0.1660 | 0.0908 | 0.6820 | 0.0753 | 0.3062 | -0.1149 | 0.1413 | -248.4617 | -258.2737 | -2.0620 | -2.1769 |
0.7075 | 0.5 | 1900 | 0.6833 | 0.1859 | 0.6579 | 0.1600 | 0.0824 | 0.6900 | 0.0776 | 0.3193 | -0.1102 | 0.1444 | -249.2928 | -258.8771 | -2.0706 | -2.1827 |
0.6648 | 0.52 | 2000 | 0.6810 | 0.1632 | 0.6568 | 0.1659 | 0.0856 | 0.6890 | 0.0804 | 0.3294 | -0.1154 | 0.1488 | -248.9815 | -258.2854 | -2.0768 | -2.1912 |
0.6852 | 0.55 | 2100 | 0.6865 | 0.2345 | 0.6555 | 0.1580 | 0.0750 | 0.6850 | 0.0830 | 0.3337 | -0.1140 | 0.1500 | -250.0391 | -259.0782 | -2.0744 | -2.1866 |
0.6703 | 0.58 | 2200 | 0.6875 | 0.2456 | 0.6551 | 0.1565 | 0.0726 | 0.6960 | 0.0838 | 0.3356 | -0.1124 | 0.1496 | -250.2751 | -259.2299 | -2.0797 | -2.1916 |
0.6962 | 0.6 | 2300 | 0.6828 | 0.1892 | 0.6568 | 0.1654 | 0.0853 | 0.6780 | 0.0801 | 0.3244 | -0.1116 | 0.1460 | -249.0074 | -258.3336 | -2.0778 | -2.1900 |
0.667 | 0.63 | 2400 | 0.6853 | 0.2336 | 0.6538 | 0.1636 | 0.0765 | 0.7000 | 0.0870 | 0.3434 | -0.1186 | 0.1546 | -249.8843 | -258.5188 | -2.0804 | -2.1916 |
0.69 | 0.65 | 2500 | 0.6837 | 0.2197 | 0.6545 | 0.1689 | 0.0834 | 0.6900 | 0.0855 | 0.3432 | -0.1193 | 0.1545 | -249.1985 | -257.9834 | -2.0760 | -2.1873 |
0.6967 | 0.68 | 2600 | 0.6840 | 0.2224 | 0.6551 | 0.1666 | 0.0824 | 0.6940 | 0.0842 | 0.3396 | -0.1165 | 0.1524 | -249.3017 | -258.2182 | -2.0761 | -2.1875 |
0.6728 | 0.71 | 2700 | 0.6821 | 0.1880 | 0.6566 | 0.1701 | 0.0892 | 0.6710 | 0.0809 | 0.3312 | -0.1169 | 0.1501 | -248.6199 | -257.8645 | -2.0769 | -2.1873 |
0.6719 | 0.73 | 2800 | 0.6800 | 0.1748 | 0.6558 | 0.1733 | 0.0903 | 0.6800 | 0.0830 | 0.3393 | -0.1211 | 0.1541 | -248.5062 | -257.5499 | -2.0781 | -2.1884 |
0.7023 | 0.76 | 2900 | 0.6865 | 0.2486 | 0.6528 | 0.1653 | 0.0755 | 0.6900 | 0.0898 | 0.3542 | -0.1207 | 0.1596 | -249.9836 | -258.3481 | -2.0760 | -2.1862 |
0.6635 | 0.79 | 3000 | 0.6847 | 0.2279 | 0.6533 | 0.1677 | 0.0793 | 0.6870 | 0.0885 | 0.3489 | -0.1193 | 0.1573 | -249.6095 | -258.1035 | -2.0810 | -2.1905 |
0.6855 | 0.81 | 3100 | 0.6828 | 0.2071 | 0.6537 | 0.1726 | 0.0848 | 0.6840 | 0.0878 | 0.3501 | -0.1204 | 0.1579 | -249.0582 | -257.6195 | -2.0781 | -2.1883 |
0.666 | 0.84 | 3200 | 0.6837 | 0.2206 | 0.6530 | 0.1705 | 0.0813 | 0.6850 | 0.0892 | 0.3528 | -0.1200 | 0.1588 | -249.4088 | -257.8239 | -2.0745 | -2.1844 |
0.6763 | 0.86 | 3300 | 0.6838 | 0.2216 | 0.6529 | 0.1705 | 0.0811 | 0.6820 | 0.0895 | 0.3537 | -0.1195 | 0.1588 | -249.4304 | -257.8217 | -2.0758 | -2.1856 |
0.6714 | 0.89 | 3400 | 0.6844 | 0.2307 | 0.6527 | 0.1704 | 0.0803 | 0.6850 | 0.0900 | 0.3554 | -0.1200 | 0.1595 | -249.5051 | -257.8407 | -2.0722 | -2.1824 |
0.7112 | 0.92 | 3500 | 0.6872 | 0.2662 | 0.6516 | 0.1664 | 0.0738 | 0.6870 | 0.0925 | 0.3616 | -0.1188 | 0.1616 | -250.1524 | -258.2368 | -2.0751 | -2.1849 |
0.6722 | 0.94 | 3600 | 0.6860 | 0.2528 | 0.6519 | 0.1677 | 0.0760 | 0.6870 | 0.0917 | 0.3591 | -0.1187 | 0.1607 | -249.9364 | -258.1039 | -2.0720 | -2.1821 |
0.6902 | 0.97 | 3700 | 0.6861 | 0.2512 | 0.6520 | 0.1680 | 0.0764 | 0.6880 | 0.0915 | 0.3587 | -0.1190 | 0.1606 | -249.8921 | -258.0789 | -2.0760 | -2.1857 |
0.6921 | 0.99 | 3800 | 0.6857 | 0.2497 | 0.6519 | 0.1680 | 0.0763 | 0.6910 | 0.0917 | 0.3589 | -0.1189 | 0.1607 | -249.9092 | -258.0792 | -2.0777 | -2.1871 |
Framework versions
- PEFT 0.7.1
- Transformers 4.39.0.dev0
- Pytorch 2.1.2+cu121
- Datasets 2.14.6
- Tokenizers 0.15.2
- Downloads last month
- 0
Model tree for just1nseo/eurus-dpop-qlora-uf-5e-7
Base model
openbmb/Eurus-7b-sft