zephyr-7b-gpo-v0-i1
This model is a fine-tuned version of DUAL-GPO/zephyr-7b-gpo-update3-i0 on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:
- Loss: 0.1128
- Rewards/chosen: -0.3200
- Rewards/rejected: -0.3706
- Rewards/accuracies: 0.4955
- Rewards/margins: 0.0506
- Logps/rejected: -621.5818
- Logps/chosen: -585.8446
- Logits/rejected: -1.9142
- Logits/chosen: -2.0965
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- distributed_type: multi-GPU
- num_devices: 3
- gradient_accumulation_steps: 2
- total_train_batch_size: 12
- total_eval_batch_size: 6
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.3416 | 0.02 | 100 | 0.0447 | -0.0994 | -0.1161 | 0.5883 | 0.0167 | -367.1221 | -365.3260 | -1.7202 | -1.8827 |
0.2571 | 0.05 | 200 | 0.0858 | -0.1849 | -0.2159 | 0.4790 | 0.0310 | -466.8627 | -450.7509 | -1.8599 | -2.0364 |
0.2771 | 0.07 | 300 | 0.0910 | -0.2419 | -0.2769 | 0.4775 | 0.0350 | -527.8735 | -507.7906 | -1.9087 | -2.0909 |
0.2561 | 0.1 | 400 | 0.1127 | -0.4661 | -0.5086 | 0.4895 | 0.0425 | -759.5652 | -731.9658 | -1.9571 | -2.1511 |
0.2604 | 0.12 | 500 | 0.0826 | -0.3221 | -0.3613 | 0.4835 | 0.0393 | -612.2919 | -587.9281 | -1.8643 | -2.0449 |
0.2778 | 0.14 | 600 | 0.1033 | -0.2940 | -0.3303 | 0.4760 | 0.0363 | -581.3212 | -559.9218 | -1.8588 | -2.0387 |
0.2631 | 0.17 | 700 | 0.1084 | -0.3587 | -0.4024 | 0.4865 | 0.0437 | -653.3798 | -624.5897 | -1.8458 | -2.0252 |
0.2264 | 0.19 | 800 | 0.1158 | -0.2355 | -0.2734 | 0.4731 | 0.0378 | -524.3303 | -501.3899 | -1.8726 | -2.0501 |
0.2593 | 0.22 | 900 | 0.1048 | -0.2730 | -0.3214 | 0.4865 | 0.0485 | -572.4186 | -538.8648 | -1.7883 | -1.9593 |
0.2248 | 0.24 | 1000 | 0.1122 | -0.2753 | -0.3216 | 0.4760 | 0.0463 | -572.5806 | -541.1548 | -1.8308 | -2.0088 |
0.2345 | 0.26 | 1100 | 0.1249 | -0.2594 | -0.2977 | 0.4581 | 0.0382 | -548.6310 | -525.3046 | -1.8628 | -2.0406 |
0.2 | 0.29 | 1200 | 0.1212 | -0.3796 | -0.4250 | 0.4925 | 0.0454 | -675.9450 | -645.4562 | -1.8382 | -2.0177 |
0.2246 | 0.31 | 1300 | 0.1102 | -0.2548 | -0.3030 | 0.4850 | 0.0482 | -553.9783 | -520.6531 | -1.9584 | -2.1449 |
0.2481 | 0.34 | 1400 | 0.1082 | -0.2988 | -0.3545 | 0.4955 | 0.0557 | -605.4994 | -564.6545 | -1.8877 | -2.0708 |
0.232 | 0.36 | 1500 | 0.1053 | -0.2421 | -0.2907 | 0.4910 | 0.0486 | -541.7161 | -508.0170 | -1.9404 | -2.1256 |
0.2351 | 0.38 | 1600 | 0.1098 | -0.3383 | -0.3864 | 0.4775 | 0.0481 | -637.3510 | -604.1564 | -1.8506 | -2.0290 |
0.2622 | 0.41 | 1700 | 0.1196 | -0.2614 | -0.3121 | 0.4820 | 0.0507 | -563.0452 | -527.2568 | -1.9197 | -2.1016 |
0.2043 | 0.43 | 1800 | 0.1257 | -0.2798 | -0.3252 | 0.4820 | 0.0454 | -576.1965 | -545.7018 | -1.9177 | -2.0980 |
0.2205 | 0.46 | 1900 | 0.1154 | -0.4037 | -0.4629 | 0.4850 | 0.0592 | -713.9170 | -669.5957 | -1.8198 | -1.9972 |
0.2156 | 0.48 | 2000 | 0.1103 | -0.2727 | -0.3161 | 0.4865 | 0.0434 | -567.0794 | -538.5911 | -1.9234 | -2.1044 |
0.2308 | 0.5 | 2100 | 0.1163 | -0.4322 | -0.4852 | 0.4925 | 0.0531 | -736.1898 | -698.0287 | -1.8013 | -1.9761 |
0.2204 | 0.53 | 2200 | 0.1083 | -0.3224 | -0.3712 | 0.4940 | 0.0488 | -622.1750 | -588.3229 | -1.8487 | -2.0260 |
0.2303 | 0.55 | 2300 | 0.1192 | -0.3117 | -0.3667 | 0.4940 | 0.0551 | -617.7075 | -577.5367 | -1.8679 | -2.0473 |
0.231 | 0.58 | 2400 | 0.1068 | -0.3476 | -0.4008 | 0.5 | 0.0532 | -651.7600 | -613.4935 | -1.8167 | -1.9926 |
0.2252 | 0.6 | 2500 | 0.1240 | -0.3568 | -0.4154 | 0.4940 | 0.0586 | -666.3873 | -622.7224 | -1.9124 | -2.0972 |
0.2445 | 0.62 | 2600 | 0.1240 | -0.3426 | -0.4003 | 0.4805 | 0.0576 | -651.2365 | -608.5200 | -1.9230 | -2.1073 |
0.2212 | 0.65 | 2700 | 0.1103 | -0.2894 | -0.3362 | 0.4925 | 0.0468 | -587.1506 | -555.2968 | -1.9049 | -2.0860 |
0.2301 | 0.67 | 2800 | 0.1073 | -0.2754 | -0.3278 | 0.5105 | 0.0524 | -578.7745 | -541.2313 | -1.9024 | -2.0838 |
0.2099 | 0.7 | 2900 | 0.1191 | -0.3108 | -0.3657 | 0.5015 | 0.0549 | -616.7156 | -576.6858 | -1.9182 | -2.1014 |
0.2072 | 0.72 | 3000 | 0.1120 | -0.3062 | -0.3563 | 0.4910 | 0.0500 | -607.2319 | -572.1099 | -1.9258 | -2.1090 |
0.2186 | 0.74 | 3100 | 0.1155 | -0.2960 | -0.3474 | 0.4985 | 0.0514 | -598.4005 | -561.9234 | -1.9031 | -2.0849 |
0.2743 | 0.77 | 3200 | 0.1121 | -0.2815 | -0.3314 | 0.4955 | 0.0499 | -582.3980 | -547.4086 | -1.9332 | -2.1170 |
0.1989 | 0.79 | 3300 | 0.1116 | -0.3235 | -0.3744 | 0.4850 | 0.0509 | -625.3889 | -589.4213 | -1.8977 | -2.0789 |
0.2258 | 0.82 | 3400 | 0.1093 | -0.3091 | -0.3603 | 0.4970 | 0.0512 | -611.2418 | -574.9766 | -1.9164 | -2.0989 |
0.2524 | 0.84 | 3500 | 0.1142 | -0.3383 | -0.3897 | 0.4910 | 0.0514 | -640.6893 | -604.2028 | -1.9130 | -2.0956 |
0.2202 | 0.86 | 3600 | 0.1173 | -0.3412 | -0.3925 | 0.4835 | 0.0513 | -643.4937 | -607.1244 | -1.9146 | -2.0973 |
0.2365 | 0.89 | 3700 | 0.1178 | -0.3273 | -0.3787 | 0.4850 | 0.0514 | -629.6786 | -593.2114 | -1.9279 | -2.1117 |
0.1894 | 0.91 | 3800 | 0.1152 | -0.3184 | -0.3694 | 0.4925 | 0.0509 | -620.3304 | -584.3237 | -1.9252 | -2.1088 |
0.2372 | 0.94 | 3900 | 0.1130 | -0.3155 | -0.3658 | 0.4940 | 0.0503 | -616.7926 | -581.3542 | -1.9194 | -2.1021 |
0.2029 | 0.96 | 4000 | 0.1133 | -0.3208 | -0.3715 | 0.4925 | 0.0507 | -622.4911 | -586.6887 | -1.9141 | -2.0964 |
0.2438 | 0.98 | 4100 | 0.1129 | -0.3199 | -0.3707 | 0.4940 | 0.0508 | -621.6636 | -585.7551 | -1.9140 | -2.0965 |
Framework versions
- PEFT 0.7.1
- Transformers 4.36.2
- Pytorch 2.1.2+cu121
- Datasets 2.14.6
- Tokenizers 0.15.2
- Downloads last month
- 2
Model tree for DUAL-GPO/zephyr-7b-gpo-v0-i1
Base model
mistralai/Mistral-7B-v0.1