eurus-dpo-qlora-uffull-5e-6
This model is a fine-tuned version of openbmb/Eurus-7b-sft on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:
- Loss: 0.5127
- Rewards/chosen: -0.9791
- Rewards/rejected: -1.9966
- Rewards/accuracies: 0.7540
- Rewards/margins: 1.0174
- Rewards/margins Max: 3.5694
- Rewards/margins Min: -0.9504
- Rewards/margins Std: 1.5237
- Logps/rejected: -462.4769
- Logps/chosen: -373.6858
- Logits/rejected: -2.0066
- Logits/chosen: -2.1034
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 4
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- total_train_batch_size: 16
- total_eval_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Rewards/margins Max | Rewards/margins Min | Rewards/margins Std | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0.6864 | 0.03 | 100 | 0.6880 | -0.0140 | -0.0283 | 0.6329 | 0.0143 | 0.0966 | -0.0527 | 0.0482 | -265.6463 | -277.1725 | -2.2230 | -2.3332 |
0.6729 | 0.05 | 200 | 0.6675 | -0.1633 | -0.2510 | 0.6627 | 0.0877 | 0.5034 | -0.2742 | 0.2543 | -287.9178 | -292.1004 | -2.1945 | -2.3031 |
0.6516 | 0.08 | 300 | 0.6332 | -0.2864 | -0.4906 | 0.6905 | 0.2042 | 0.8657 | -0.3947 | 0.4208 | -311.8771 | -304.4155 | -2.1827 | -2.2904 |
0.6259 | 0.1 | 400 | 0.6459 | -1.4444 | -2.0134 | 0.6488 | 0.5690 | 2.7419 | -1.2404 | 1.3151 | -464.1583 | -420.2169 | -2.0161 | -2.1158 |
0.5981 | 0.13 | 500 | 0.5951 | -0.4738 | -0.8890 | 0.7004 | 0.4151 | 1.7169 | -0.5423 | 0.7476 | -351.7183 | -323.1576 | -2.0982 | -2.2026 |
0.5825 | 0.16 | 600 | 0.6147 | -1.4298 | -2.1755 | 0.6766 | 0.7458 | 3.1883 | -1.2023 | 1.4469 | -480.3750 | -418.7514 | -1.9080 | -2.0118 |
0.6157 | 0.18 | 700 | 0.5762 | -1.0422 | -1.6487 | 0.7044 | 0.6066 | 2.5214 | -0.8306 | 1.1064 | -427.6948 | -379.9899 | -1.8007 | -1.8987 |
0.5937 | 0.21 | 800 | 0.5623 | -0.6723 | -1.2169 | 0.7242 | 0.5447 | 2.0184 | -0.5908 | 0.8750 | -384.5144 | -343.0002 | -1.9444 | -2.0444 |
0.5394 | 0.24 | 900 | 0.5627 | -1.0989 | -1.9261 | 0.7302 | 0.8273 | 3.2426 | -0.8732 | 1.3769 | -455.4331 | -385.6613 | -2.0832 | -2.1830 |
0.6262 | 0.26 | 1000 | 0.5604 | -1.1248 | -1.9857 | 0.7143 | 0.8609 | 3.4243 | -0.9201 | 1.4521 | -461.3933 | -388.2573 | -1.9102 | -2.0114 |
0.5723 | 0.29 | 1100 | 0.5496 | -0.7408 | -1.5482 | 0.7381 | 0.8074 | 3.2334 | -0.6981 | 1.3203 | -417.6383 | -349.8509 | -1.9847 | -2.0879 |
0.5501 | 0.31 | 1200 | 0.5542 | -0.6061 | -1.1959 | 0.7321 | 0.5899 | 2.1036 | -0.5358 | 0.8885 | -382.4131 | -336.3819 | -1.8930 | -1.9914 |
0.5382 | 0.34 | 1300 | 0.5417 | -1.1698 | -2.0706 | 0.7460 | 0.9008 | 3.3611 | -0.9081 | 1.4208 | -469.8816 | -392.7588 | -1.7319 | -1.8331 |
0.5759 | 0.37 | 1400 | 0.5406 | -0.9231 | -1.8635 | 0.7401 | 0.9404 | 3.5157 | -0.8329 | 1.4521 | -449.1679 | -368.0823 | -1.8351 | -1.9399 |
0.5367 | 0.39 | 1500 | 0.5376 | -0.8430 | -1.7065 | 0.7560 | 0.8635 | 3.1796 | -0.8328 | 1.3201 | -433.4751 | -360.0789 | -1.8587 | -1.9608 |
0.5345 | 0.42 | 1600 | 0.5269 | -0.8832 | -1.7856 | 0.7381 | 0.9024 | 3.3303 | -0.8483 | 1.3858 | -441.3758 | -364.0924 | -1.8133 | -1.9167 |
0.5132 | 0.44 | 1700 | 0.5339 | -1.0951 | -2.0179 | 0.7540 | 0.9228 | 3.2850 | -0.9130 | 1.4005 | -464.6132 | -385.2873 | -1.8670 | -1.9681 |
0.5451 | 0.47 | 1800 | 0.5310 | -0.7777 | -1.6911 | 0.7282 | 0.9135 | 3.4268 | -0.8127 | 1.4169 | -431.9351 | -353.5432 | -1.8431 | -1.9515 |
0.5126 | 0.5 | 1900 | 0.5315 | -1.0683 | -2.0616 | 0.7302 | 0.9933 | 3.6236 | -0.9938 | 1.5447 | -468.9817 | -382.6060 | -1.8568 | -1.9592 |
0.5173 | 0.52 | 2000 | 0.5273 | -0.9246 | -1.8103 | 0.7421 | 0.8857 | 3.2625 | -0.9327 | 1.3899 | -443.8511 | -368.2305 | -1.9264 | -2.0273 |
0.5241 | 0.55 | 2100 | 0.5267 | -1.0388 | -2.0045 | 0.7262 | 0.9657 | 3.5894 | -1.0169 | 1.5350 | -463.2707 | -379.6525 | -1.9509 | -2.0505 |
0.4912 | 0.58 | 2200 | 0.5236 | -1.0773 | -2.1473 | 0.7460 | 1.0699 | 3.9227 | -1.0592 | 1.6634 | -477.5478 | -383.5082 | -1.9172 | -2.0173 |
0.5792 | 0.6 | 2300 | 0.5177 | -0.8715 | -1.7418 | 0.7361 | 0.8703 | 3.0821 | -0.8725 | 1.3249 | -436.9993 | -362.9194 | -2.0500 | -2.1480 |
0.5628 | 0.63 | 2400 | 0.5218 | -0.9891 | -1.9917 | 0.7460 | 1.0026 | 3.6936 | -1.0654 | 1.5794 | -461.9902 | -374.6792 | -2.0218 | -2.1218 |
0.5217 | 0.65 | 2500 | 0.5324 | -1.2240 | -2.4529 | 0.7480 | 1.2290 | 4.5548 | -1.2387 | 1.9354 | -508.1148 | -398.1707 | -1.9639 | -2.0649 |
0.581 | 0.68 | 2600 | 0.5199 | -0.9497 | -1.9408 | 0.7381 | 0.9910 | 3.5052 | -0.9698 | 1.5040 | -456.8956 | -370.7460 | -1.9873 | -2.0864 |
0.518 | 0.71 | 2700 | 0.5212 | -1.0617 | -2.1128 | 0.7401 | 1.0511 | 3.7114 | -1.0556 | 1.6114 | -474.0986 | -381.9437 | -1.9898 | -2.0884 |
0.5646 | 0.73 | 2800 | 0.5173 | -0.9139 | -1.8873 | 0.7401 | 0.9734 | 3.4192 | -0.9267 | 1.4687 | -451.5462 | -367.1606 | -1.9649 | -2.0632 |
0.5608 | 0.76 | 2900 | 0.5170 | -1.0090 | -2.0514 | 0.7421 | 1.0424 | 3.6819 | -1.0248 | 1.5843 | -467.9605 | -376.6732 | -1.9805 | -2.0788 |
0.4166 | 0.79 | 3000 | 0.5134 | -0.9849 | -1.9772 | 0.7421 | 0.9923 | 3.4268 | -0.9556 | 1.4828 | -460.5416 | -374.2640 | -1.9769 | -2.0737 |
0.5672 | 0.81 | 3100 | 0.5129 | -0.9737 | -1.9738 | 0.7520 | 1.0001 | 3.4737 | -0.9442 | 1.4902 | -460.2002 | -373.1453 | -1.9761 | -2.0727 |
0.4843 | 0.84 | 3200 | 0.5127 | -0.9899 | -1.9951 | 0.7480 | 1.0053 | 3.4925 | -0.9434 | 1.4955 | -462.3347 | -374.7598 | -1.9879 | -2.0844 |
0.5234 | 0.86 | 3300 | 0.5123 | -0.9618 | -1.9579 | 0.7480 | 0.9961 | 3.4685 | -0.9316 | 1.4824 | -458.6060 | -371.9529 | -2.0078 | -2.1041 |
0.4751 | 0.89 | 3400 | 0.5128 | -0.9715 | -1.9858 | 0.7480 | 1.0143 | 3.5545 | -0.9477 | 1.5159 | -461.4002 | -372.9207 | -2.0063 | -2.1035 |
0.5294 | 0.92 | 3500 | 0.5131 | -0.9928 | -2.0226 | 0.7460 | 1.0298 | 3.6184 | -0.9685 | 1.5451 | -465.0800 | -375.0580 | -2.0043 | -2.1015 |
0.5066 | 0.94 | 3600 | 0.5129 | -0.9814 | -2.0001 | 0.75 | 1.0187 | 3.5761 | -0.9557 | 1.5271 | -462.8294 | -373.9119 | -2.0121 | -2.1084 |
0.5396 | 0.97 | 3700 | 0.5126 | -0.9787 | -1.9952 | 0.7520 | 1.0165 | 3.5676 | -0.9529 | 1.5231 | -462.3404 | -373.6405 | -2.0075 | -2.1043 |
0.5374 | 0.99 | 3800 | 0.5127 | -0.9798 | -1.9982 | 0.75 | 1.0185 | 3.5723 | -0.9502 | 1.5244 | -462.6427 | -373.7504 | -2.0092 | -2.1060 |
Framework versions
- PEFT 0.7.1
- Transformers 4.39.0.dev0
- Pytorch 2.1.2+cu121
- Datasets 2.14.6
- Tokenizers 0.15.2
- Downloads last month
- 6
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no pipeline_tag.
Model tree for just1nseo/eurus-dpo-qlora-uffull-5e-6
Base model
openbmb/Eurus-7b-sft