eurus-dpop-qlora-uf-5e-7

This model is a fine-tuned version of openbmb/Eurus-7b-sft on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6859
  • Positive Losses: 0.2505
  • Dpo Losses: 0.6519
  • Rewards/chosen: 0.1680
  • Rewards/rejected: 0.0763
  • Rewards/accuracies: 0.6850
  • Rewards/margins: 0.0917
  • Rewards/margins Max: 0.3593
  • Rewards/margins Min: -0.1191
  • Rewards/margins Std: 0.1608
  • Logps/rejected: -249.9021
  • Logps/chosen: -258.0758
  • Logits/rejected: -2.0757
  • Logits/chosen: -2.1855

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 2
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 16
  • total_eval_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Positive Losses Dpo Losses Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Rewards/margins Max Rewards/margins Min Rewards/margins Std Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6931 0.03 100 0.6927 0.0087 0.6918 0.0151 0.0123 0.5400 0.0028 0.0252 -0.0152 0.0132 -256.3040 -273.3622 -2.1875 -2.3101
0.6865 0.05 200 0.6893 0.0127 0.6876 0.0411 0.0296 0.6190 0.0114 0.0652 -0.0319 0.0318 -254.5748 -270.7704 -2.1895 -2.3112
0.6858 0.08 300 0.6810 0.0282 0.6762 0.1236 0.0879 0.6770 0.0357 0.1625 -0.0694 0.0756 -248.7482 -262.5165 -2.1683 -2.2892
0.6921 0.1 400 0.6993 0.3136 0.6660 0.1158 0.0575 0.7100 0.0583 0.2335 -0.0929 0.1089 -251.7885 -263.2999 -2.1484 -2.2675
0.7032 0.13 500 0.6833 0.1186 0.6693 0.1493 0.0978 0.6690 0.0515 0.2281 -0.0970 0.1084 -247.7569 -259.9434 -2.1268 -2.2450
0.6709 0.16 600 0.6814 0.1022 0.6685 0.1465 0.0936 0.6730 0.0529 0.2253 -0.0901 0.1063 -248.1758 -260.2231 -2.1172 -2.2344
0.6987 0.18 700 0.6806 0.1160 0.6673 0.1507 0.0949 0.6790 0.0558 0.2360 -0.0982 0.1127 -248.0485 -259.8086 -2.0848 -2.2010
0.6852 0.21 800 0.6832 0.1528 0.6649 0.1516 0.0903 0.6790 0.0614 0.2546 -0.1056 0.1215 -248.5094 -259.7122 -2.1049 -2.2238
0.6876 0.24 900 0.6822 0.1465 0.6644 0.1562 0.0937 0.6830 0.0625 0.2574 -0.1067 0.1231 -248.1658 -259.2580 -2.0901 -2.2109
0.698 0.26 1000 0.6801 0.1039 0.6644 0.1618 0.0992 0.6730 0.0627 0.2654 -0.1054 0.1244 -247.6179 -258.6924 -2.0867 -2.2065
0.7006 0.29 1100 0.6877 0.1930 0.6611 0.1586 0.0883 0.6860 0.0703 0.2899 -0.1088 0.1345 -248.7062 -259.0172 -2.0660 -2.1846
0.6737 0.31 1200 0.6805 0.1326 0.6622 0.1634 0.0958 0.6820 0.0675 0.2808 -0.1050 0.1300 -247.9524 -258.5385 -2.0612 -2.1789
0.7092 0.34 1300 0.6956 0.3335 0.6572 0.1397 0.0607 0.6950 0.0790 0.3144 -0.1128 0.1445 -251.4707 -260.9076 -2.0676 -2.1846
0.717 0.37 1400 0.6827 0.1951 0.6599 0.1642 0.0910 0.6880 0.0733 0.3075 -0.1193 0.1424 -248.4406 -258.4545 -2.0552 -2.1724
0.69 0.39 1500 0.6954 0.3337 0.6561 0.1557 0.0736 0.6940 0.0821 0.3383 -0.1185 0.1527 -250.1766 -259.3101 -2.0743 -2.1905
0.7032 0.42 1600 0.6884 0.2603 0.6583 0.1568 0.0798 0.6800 0.0769 0.3197 -0.1150 0.1463 -249.5549 -259.2010 -2.0497 -2.1647
0.6745 0.44 1700 0.6896 0.2743 0.6554 0.1534 0.0702 0.6960 0.0833 0.3295 -0.1172 0.1496 -250.5217 -259.5353 -2.0491 -2.1657
0.738 0.47 1800 0.6787 0.1378 0.6589 0.1660 0.0908 0.6820 0.0753 0.3062 -0.1149 0.1413 -248.4617 -258.2737 -2.0620 -2.1769
0.7075 0.5 1900 0.6833 0.1859 0.6579 0.1600 0.0824 0.6900 0.0776 0.3193 -0.1102 0.1444 -249.2928 -258.8771 -2.0706 -2.1827
0.6648 0.52 2000 0.6810 0.1632 0.6568 0.1659 0.0856 0.6890 0.0804 0.3294 -0.1154 0.1488 -248.9815 -258.2854 -2.0768 -2.1912
0.6852 0.55 2100 0.6865 0.2345 0.6555 0.1580 0.0750 0.6850 0.0830 0.3337 -0.1140 0.1500 -250.0391 -259.0782 -2.0744 -2.1866
0.6703 0.58 2200 0.6875 0.2456 0.6551 0.1565 0.0726 0.6960 0.0838 0.3356 -0.1124 0.1496 -250.2751 -259.2299 -2.0797 -2.1916
0.6962 0.6 2300 0.6828 0.1892 0.6568 0.1654 0.0853 0.6780 0.0801 0.3244 -0.1116 0.1460 -249.0074 -258.3336 -2.0778 -2.1900
0.667 0.63 2400 0.6853 0.2336 0.6538 0.1636 0.0765 0.7000 0.0870 0.3434 -0.1186 0.1546 -249.8843 -258.5188 -2.0804 -2.1916
0.69 0.65 2500 0.6837 0.2197 0.6545 0.1689 0.0834 0.6900 0.0855 0.3432 -0.1193 0.1545 -249.1985 -257.9834 -2.0760 -2.1873
0.6967 0.68 2600 0.6840 0.2224 0.6551 0.1666 0.0824 0.6940 0.0842 0.3396 -0.1165 0.1524 -249.3017 -258.2182 -2.0761 -2.1875
0.6728 0.71 2700 0.6821 0.1880 0.6566 0.1701 0.0892 0.6710 0.0809 0.3312 -0.1169 0.1501 -248.6199 -257.8645 -2.0769 -2.1873
0.6719 0.73 2800 0.6800 0.1748 0.6558 0.1733 0.0903 0.6800 0.0830 0.3393 -0.1211 0.1541 -248.5062 -257.5499 -2.0781 -2.1884
0.7023 0.76 2900 0.6865 0.2486 0.6528 0.1653 0.0755 0.6900 0.0898 0.3542 -0.1207 0.1596 -249.9836 -258.3481 -2.0760 -2.1862
0.6635 0.79 3000 0.6847 0.2279 0.6533 0.1677 0.0793 0.6870 0.0885 0.3489 -0.1193 0.1573 -249.6095 -258.1035 -2.0810 -2.1905
0.6855 0.81 3100 0.6828 0.2071 0.6537 0.1726 0.0848 0.6840 0.0878 0.3501 -0.1204 0.1579 -249.0582 -257.6195 -2.0781 -2.1883
0.666 0.84 3200 0.6837 0.2206 0.6530 0.1705 0.0813 0.6850 0.0892 0.3528 -0.1200 0.1588 -249.4088 -257.8239 -2.0745 -2.1844
0.6763 0.86 3300 0.6838 0.2216 0.6529 0.1705 0.0811 0.6820 0.0895 0.3537 -0.1195 0.1588 -249.4304 -257.8217 -2.0758 -2.1856
0.6714 0.89 3400 0.6844 0.2307 0.6527 0.1704 0.0803 0.6850 0.0900 0.3554 -0.1200 0.1595 -249.5051 -257.8407 -2.0722 -2.1824
0.7112 0.92 3500 0.6872 0.2662 0.6516 0.1664 0.0738 0.6870 0.0925 0.3616 -0.1188 0.1616 -250.1524 -258.2368 -2.0751 -2.1849
0.6722 0.94 3600 0.6860 0.2528 0.6519 0.1677 0.0760 0.6870 0.0917 0.3591 -0.1187 0.1607 -249.9364 -258.1039 -2.0720 -2.1821
0.6902 0.97 3700 0.6861 0.2512 0.6520 0.1680 0.0764 0.6880 0.0915 0.3587 -0.1190 0.1606 -249.8921 -258.0789 -2.0760 -2.1857
0.6921 0.99 3800 0.6857 0.2497 0.6519 0.1680 0.0763 0.6910 0.0917 0.3589 -0.1189 0.1607 -249.9092 -258.0792 -2.0777 -2.1871

Framework versions

  • PEFT 0.7.1
  • Transformers 4.39.0.dev0
  • Pytorch 2.1.2+cu121
  • Datasets 2.14.6
  • Tokenizers 0.15.2
Downloads last month
0
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for just1nseo/eurus-dpop-qlora-uf-5e-7

Adapter
(18)
this model

Dataset used to train just1nseo/eurus-dpop-qlora-uf-5e-7