--- base_model: stabilityai/StableBeluga-13B tags: - generated_from_trainer model-index: - name: PE-13b-full results: [] --- # PE-13b-full This model is a fine-tuned version of [stabilityai/StableBeluga-13B](https://huggingface.co/stabilityai/StableBeluga-13B) on an unknown dataset. It achieves the following results on the evaluation set: - Loss: 0.0094 - Rewards/chosen: -1.2833 - Rewards/rejected: -29.7294 - Rewards/accuracies: 0.9916 - Rewards/margins: 28.4460 - Logps/rejected: -121.9200 - Logps/chosen: -84.7524 - Logits/rejected: -2.1605 - Logits/chosen: -2.4403 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 3e-07 - train_batch_size: 1 - eval_batch_size: 2 - seed: 42 - distributed_type: multi-GPU - num_devices: 8 - gradient_accumulation_steps: 8 - total_train_batch_size: 64 - total_eval_batch_size: 16 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 3 ### Training results | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:| | 0.5085 | 0.05 | 100 | 0.4978 | 0.1241 | -0.3334 | 0.9525 | 0.4575 | -63.1282 | -81.9376 | -2.0870 | -2.3586 | | 0.1966 | 0.09 | 200 | 0.2003 | 0.5022 | -1.3704 | 0.9804 | 1.8726 | -65.2020 | -81.1812 | -2.0918 | -2.3650 | | 0.0612 | 0.14 | 300 | 0.0656 | 0.8997 | -3.3315 | 0.9888 | 4.2312 | -69.1243 | -80.3863 | -2.0887 | -2.3741 | | 0.029 | 0.18 | 400 | 0.0356 | 0.9536 | -5.0607 | 0.9944 | 6.0143 | -72.5827 | -80.2785 | -2.0905 | -2.3804 | | 0.0187 | 0.23 | 500 | 0.0201 | 0.9079 | -7.5059 | 0.9888 | 8.4139 | -77.4731 | -80.3699 | -2.0974 | -2.3915 | | 0.0112 | 0.27 | 600 | 0.0130 | 0.7188 | -10.4500 | 0.9916 | 11.1688 | -83.3612 | -80.7481 | -2.0987 | -2.3960 | | 0.0066 | 0.32 | 700 | 0.0102 | 0.6639 | -13.1345 | 0.9916 | 13.7984 | -88.7303 | -80.8579 | -2.1111 | -2.4104 | | 0.0088 | 0.37 | 800 | 0.0098 | 0.9128 | -13.1977 | 0.9888 | 14.1105 | -88.8568 | -80.3601 | -2.1031 | -2.4030 | | 0.0054 | 0.41 | 900 | 0.0092 | 0.6109 | -15.6398 | 0.9888 | 16.2507 | -93.7409 | -80.9640 | -2.1158 | -2.4144 | | 0.0044 | 0.46 | 1000 | 0.0094 | 0.9982 | -16.0071 | 0.9916 | 17.0053 | -94.4755 | -80.1893 | -2.0988 | -2.3946 | | 0.0061 | 0.5 | 1100 | 0.0089 | 0.5504 | -18.0125 | 0.9916 | 18.5630 | -98.4864 | -81.0849 | -2.0991 | -2.3955 | | 0.024 | 0.55 | 1200 | 0.0088 | 0.4877 | -16.6683 | 0.9916 | 17.1561 | -95.7980 | -81.2103 | -2.0748 | -2.3633 | | 0.0039 | 0.59 | 1300 | 0.0087 | 0.3755 | -18.5093 | 0.9916 | 18.8848 | -99.4799 | -81.4347 | -2.0746 | -2.3623 | | 0.0051 | 0.64 | 1400 | 0.0086 | 0.1176 | -20.5558 | 0.9916 | 20.6734 | -103.5730 | -81.9506 | -2.0819 | -2.3738 | | 0.0023 | 0.68 | 1500 | 0.0089 | 0.1552 | -20.0740 | 0.9888 | 20.2292 | -102.6092 | -81.8754 | -2.0813 | -2.3667 | | 0.0027 | 0.73 | 1600 | 0.0089 | -0.5025 | -20.7978 | 0.9888 | 20.2953 | -104.0569 | -83.1908 | -2.1179 | -2.4078 | | 0.0031 | 0.78 | 1700 | 0.0085 | -0.6314 | -21.0492 | 0.9916 | 20.4178 | -104.5597 | -83.4485 | -2.0915 | -2.3773 | | 0.0049 | 0.82 | 1800 | 0.0085 | -0.7786 | -21.3333 | 0.9916 | 20.5547 | -105.1278 | -83.7429 | -2.0670 | -2.3504 | | 0.0023 | 0.87 | 1900 | 0.0084 | -0.7496 | -22.3377 | 0.9944 | 21.5880 | -107.1367 | -83.6850 | -2.0729 | -2.3547 | | 0.0067 | 0.91 | 2000 | 0.0086 | -0.8126 | -22.8024 | 0.9916 | 21.9899 | -108.0662 | -83.8109 | -2.0651 | -2.3472 | | 0.0041 | 0.96 | 2100 | 0.0082 | -0.7903 | -21.8379 | 0.9944 | 21.0476 | -106.1371 | -83.7663 | -2.0363 | -2.3137 | | 0.0025 | 1.0 | 2200 | 0.0079 | -0.4489 | -21.4451 | 0.9916 | 20.9963 | -105.3516 | -83.0835 | -2.0303 | -2.3074 | | 0.0023 | 1.05 | 2300 | 0.0082 | -1.1267 | -22.7620 | 0.9944 | 21.6353 | -107.9852 | -84.4391 | -2.0477 | -2.3260 | | 0.0055 | 1.1 | 2400 | 0.0085 | -1.4969 | -24.0568 | 0.9888 | 22.5599 | -110.5749 | -85.1796 | -2.0616 | -2.3384 | | 0.0139 | 1.14 | 2500 | 0.0077 | 0.4564 | -20.3860 | 0.9916 | 20.8424 | -103.2333 | -81.2730 | -2.0453 | -2.3206 | | 0.0023 | 1.19 | 2600 | 0.0081 | 0.0858 | -21.9640 | 0.9916 | 22.0498 | -106.3893 | -82.0141 | -2.0528 | -2.3273 | | 0.0046 | 1.23 | 2700 | 0.0083 | -0.2543 | -23.4016 | 0.9916 | 23.1473 | -109.2646 | -82.6943 | -2.0668 | -2.3457 | | 0.0033 | 1.28 | 2800 | 0.0083 | -0.3317 | -23.7872 | 0.9916 | 23.4555 | -110.0356 | -82.8491 | -2.0884 | -2.3650 | | 0.0023 | 1.32 | 2900 | 0.0084 | -0.2753 | -24.3682 | 0.9916 | 24.0929 | -111.1976 | -82.7362 | -2.1054 | -2.3879 | | 0.0034 | 1.37 | 3000 | 0.0081 | 0.4328 | -23.3162 | 0.9916 | 23.7491 | -109.0938 | -81.3201 | -2.0817 | -2.3565 | | 0.0033 | 1.42 | 3100 | 0.0082 | -0.0254 | -23.7390 | 0.9944 | 23.7136 | -109.9394 | -82.2366 | -2.0706 | -2.3447 | | 0.0033 | 1.46 | 3200 | 0.0086 | -0.7680 | -24.0452 | 0.9916 | 23.2772 | -110.5517 | -83.7218 | -2.0760 | -2.3543 | | 0.0032 | 1.51 | 3300 | 0.0086 | -0.0016 | -23.5161 | 0.9944 | 23.5145 | -109.4934 | -82.1889 | -2.0881 | -2.3655 | | 0.0011 | 1.55 | 3400 | 0.0084 | 0.0195 | -24.2635 | 0.9944 | 24.2831 | -110.9884 | -82.1467 | -2.0878 | -2.3667 | | 0.0002 | 1.6 | 3500 | 0.0087 | 0.0421 | -24.8306 | 0.9916 | 24.8728 | -112.1225 | -82.1015 | -2.0890 | -2.3698 | | 0.0034 | 1.64 | 3600 | 0.0086 | -0.2729 | -25.8106 | 0.9916 | 25.5377 | -114.0825 | -82.7315 | -2.1030 | -2.3851 | | 0.0027 | 1.69 | 3700 | 0.0086 | 0.0339 | -25.0221 | 0.9916 | 25.0560 | -112.5055 | -82.1179 | -2.1300 | -2.4147 | | 0.0056 | 1.73 | 3800 | 0.0082 | 0.1800 | -23.6173 | 0.9916 | 23.7974 | -109.6960 | -81.8257 | -2.1140 | -2.3980 | | 0.0026 | 1.78 | 3900 | 0.0083 | -0.0334 | -24.6060 | 0.9944 | 24.5725 | -111.6733 | -82.2526 | -2.1140 | -2.3965 | | 0.0036 | 1.83 | 4000 | 0.0080 | -0.2511 | -23.0433 | 0.9916 | 22.7923 | -108.5479 | -82.6879 | -2.1348 | -2.4167 | | 0.0044 | 1.87 | 4100 | 0.0084 | -0.4259 | -23.7811 | 0.9916 | 23.3551 | -110.0234 | -83.0376 | -2.1314 | -2.4160 | | 0.0022 | 1.92 | 4200 | 0.0083 | -0.5710 | -23.2360 | 0.9944 | 22.6650 | -108.9332 | -83.3277 | -2.1369 | -2.4196 | | 0.0044 | 1.96 | 4300 | 0.0085 | -0.6363 | -24.6474 | 0.9972 | 24.0111 | -111.7560 | -83.4583 | -2.1307 | -2.4109 | | 0.0023 | 2.01 | 4400 | 0.0085 | -0.6133 | -24.9492 | 0.9916 | 24.3359 | -112.3597 | -83.4124 | -2.1322 | -2.4134 | | 0.0033 | 2.05 | 4500 | 0.0085 | -0.7101 | -25.5054 | 0.9916 | 24.7953 | -113.4721 | -83.6059 | -2.1326 | -2.4142 | | 0.0023 | 2.1 | 4600 | 0.0087 | -0.7855 | -26.0511 | 0.9916 | 25.2656 | -114.5634 | -83.7567 | -2.1333 | -2.4152 | | 0.0011 | 2.15 | 4700 | 0.0088 | -0.9006 | -26.5845 | 0.9944 | 25.6839 | -115.6303 | -83.9870 | -2.1369 | -2.4198 | | 0.0065 | 2.19 | 4800 | 0.0088 | -0.7570 | -26.8960 | 0.9916 | 26.1390 | -116.2533 | -83.6997 | -2.1393 | -2.4198 | | 0.0022 | 2.24 | 4900 | 0.0091 | -0.9581 | -27.9431 | 0.9916 | 26.9850 | -118.3475 | -84.1019 | -2.1428 | -2.4245 | | 0.0026 | 2.28 | 5000 | 0.0091 | -1.2522 | -28.8309 | 0.9944 | 27.5788 | -120.1232 | -84.6901 | -2.1479 | -2.4287 | | 0.0033 | 2.33 | 5100 | 0.0089 | -0.8602 | -28.7323 | 0.9916 | 27.8721 | -119.9259 | -83.9062 | -2.1522 | -2.4328 | | 0.0041 | 2.37 | 5200 | 0.0091 | -1.0405 | -29.2861 | 0.9916 | 28.2456 | -121.0335 | -84.2668 | -2.1536 | -2.4343 | | 0.0023 | 2.42 | 5300 | 0.0093 | -1.1323 | -29.5240 | 0.9916 | 28.3917 | -121.5093 | -84.4504 | -2.1529 | -2.4336 | | 0.0022 | 2.46 | 5400 | 0.0092 | -1.2202 | -29.2127 | 0.9916 | 27.9925 | -120.8866 | -84.6261 | -2.1595 | -2.4416 | | 0.0 | 2.51 | 5500 | 0.0093 | -1.4371 | -29.7063 | 0.9916 | 28.2692 | -121.8739 | -85.0599 | -2.1609 | -2.4404 | | 0.0022 | 2.56 | 5600 | 0.0095 | -1.4397 | -30.0202 | 0.9944 | 28.5804 | -122.5016 | -85.0652 | -2.1584 | -2.4383 | | 0.0011 | 2.6 | 5700 | 0.0096 | -1.6125 | -30.0945 | 0.9916 | 28.4820 | -122.6504 | -85.4108 | -2.1601 | -2.4395 | | 0.0053 | 2.65 | 5800 | 0.0095 | -1.5638 | -30.0025 | 0.9944 | 28.4387 | -122.4663 | -85.3133 | -2.1615 | -2.4398 | | 0.003 | 2.69 | 5900 | 0.0095 | -1.5904 | -30.1980 | 0.9916 | 28.6076 | -122.8572 | -85.3666 | -2.1606 | -2.4406 | | 0.0011 | 2.74 | 6000 | 0.0094 | -1.5286 | -30.0882 | 0.9944 | 28.5596 | -122.6377 | -85.2429 | -2.1615 | -2.4403 | | 0.0008 | 2.78 | 6100 | 0.0095 | -1.4405 | -30.0174 | 0.9916 | 28.5769 | -122.4961 | -85.0667 | -2.1615 | -2.4400 | | 0.0022 | 2.83 | 6200 | 0.0093 | -1.3508 | -29.9317 | 0.9916 | 28.5808 | -122.3246 | -84.8874 | -2.1599 | -2.4395 | | 0.0019 | 2.88 | 6300 | 0.0093 | -1.2416 | -29.6525 | 0.9916 | 28.4109 | -121.7663 | -84.6690 | -2.1620 | -2.4415 | | 0.0034 | 2.92 | 6400 | 0.0093 | -1.2995 | -29.7927 | 0.9916 | 28.4932 | -122.0468 | -84.7848 | -2.1616 | -2.4412 | | 0.0014 | 2.97 | 6500 | 0.0092 | -1.2574 | -29.7200 | 0.9916 | 28.4626 | -121.9014 | -84.7006 | -2.1595 | -2.4408 | ### Framework versions - Transformers 4.35.0 - Pytorch 2.1.1+cu121 - Datasets 2.14.6 - Tokenizers 0.14.1