tinyllama-1.1b-sum-dpo-full_LR1e-7_2epochs_old
This model is a fine-tuned version of martimfasantos/tinyllama-1.1b-sum-sft-full_old on the openai/summarize_from_feedback dataset. It achieves the following results on the evaluation set:
- Loss: 0.6549
- Rewards/chosen: -0.4976
- Rewards/rejected: -0.6011
- Rewards/accuracies: 0.6194
- Rewards/margins: 0.1035
- Logps/rejected: -123.2918
- Logps/chosen: -108.4708
- Logits/rejected: -2.5511
- Logits/chosen: -2.5579
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-07
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- gradient_accumulation_steps: 2
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 2
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.6932 | 0.0172 | 100 | 0.6932 | -0.0000 | 0.0000 | 0.4930 | -0.0001 | -63.1768 | -58.7146 | -3.1573 | -3.1630 |
0.6932 | 0.0345 | 200 | 0.6932 | -0.0001 | -0.0000 | 0.4772 | -0.0001 | -63.1802 | -58.7210 | -3.1574 | -3.1630 |
0.6931 | 0.0517 | 300 | 0.6932 | -0.0000 | 0.0001 | 0.4840 | -0.0001 | -63.1670 | -58.7127 | -3.1573 | -3.1630 |
0.693 | 0.0689 | 400 | 0.6932 | -0.0000 | 0.0001 | 0.4828 | -0.0001 | -63.1728 | -58.7120 | -3.1575 | -3.1632 |
0.6931 | 0.0861 | 500 | 0.6932 | 0.0002 | 0.0003 | 0.4775 | -0.0001 | -63.1514 | -58.6883 | -3.1571 | -3.1627 |
0.6924 | 0.1034 | 600 | 0.6931 | 0.0004 | 0.0003 | 0.5021 | 0.0001 | -63.1466 | -58.6704 | -3.1564 | -3.1621 |
0.6926 | 0.1206 | 700 | 0.6931 | 0.0006 | 0.0004 | 0.5163 | 0.0002 | -63.1388 | -58.6536 | -3.1556 | -3.1613 |
0.6922 | 0.1378 | 800 | 0.6930 | 0.0011 | 0.0007 | 0.5328 | 0.0004 | -63.1062 | -58.6016 | -3.1544 | -3.1601 |
0.6919 | 0.1551 | 900 | 0.6928 | 0.0015 | 0.0008 | 0.5467 | 0.0008 | -63.1024 | -58.5586 | -3.1525 | -3.1581 |
0.6924 | 0.1723 | 1000 | 0.6926 | 0.0018 | 0.0007 | 0.5632 | 0.0011 | -63.1061 | -58.5285 | -3.1495 | -3.1551 |
0.6913 | 0.1895 | 1100 | 0.6924 | 0.0021 | 0.0006 | 0.5748 | 0.0015 | -63.1198 | -58.5001 | -3.1456 | -3.1512 |
0.6911 | 0.2068 | 1200 | 0.6921 | 0.0023 | 0.0001 | 0.5829 | 0.0022 | -63.1702 | -58.4863 | -3.1409 | -3.1465 |
0.6911 | 0.2240 | 1300 | 0.6918 | 0.0018 | -0.0011 | 0.5783 | 0.0029 | -63.2862 | -58.5324 | -3.1359 | -3.1415 |
0.6871 | 0.2412 | 1400 | 0.6914 | 0.0005 | -0.0030 | 0.5718 | 0.0036 | -63.4832 | -58.6569 | -3.1301 | -3.1358 |
0.6865 | 0.2584 | 1500 | 0.6910 | -0.0015 | -0.0060 | 0.5760 | 0.0045 | -63.7806 | -58.8602 | -3.1249 | -3.1305 |
0.6876 | 0.2757 | 1600 | 0.6906 | -0.0038 | -0.0091 | 0.5860 | 0.0053 | -64.0945 | -59.0966 | -3.1178 | -3.1235 |
0.6883 | 0.2929 | 1700 | 0.6903 | -0.0066 | -0.0127 | 0.5846 | 0.0061 | -64.4541 | -59.3744 | -3.1115 | -3.1171 |
0.684 | 0.3101 | 1800 | 0.6900 | -0.0121 | -0.0190 | 0.5843 | 0.0069 | -65.0824 | -59.9254 | -3.1036 | -3.1093 |
0.6834 | 0.3274 | 1900 | 0.6895 | -0.0157 | -0.0236 | 0.5881 | 0.0078 | -65.5351 | -60.2850 | -3.0983 | -3.1039 |
0.6852 | 0.3446 | 2000 | 0.6890 | -0.0228 | -0.0319 | 0.5888 | 0.0091 | -66.3715 | -60.9889 | -3.0904 | -3.0961 |
0.6827 | 0.3618 | 2100 | 0.6883 | -0.0310 | -0.0417 | 0.5885 | 0.0107 | -67.3509 | -61.8145 | -3.0840 | -3.0897 |
0.6745 | 0.3790 | 2200 | 0.6876 | -0.0382 | -0.0505 | 0.5860 | 0.0123 | -68.2293 | -62.5301 | -3.0753 | -3.0810 |
0.678 | 0.3963 | 2300 | 0.6872 | -0.0406 | -0.0536 | 0.5890 | 0.0131 | -68.5438 | -62.7670 | -3.0691 | -3.0748 |
0.6808 | 0.4135 | 2400 | 0.6867 | -0.0471 | -0.0614 | 0.5881 | 0.0143 | -69.3158 | -63.4223 | -3.0596 | -3.0652 |
0.683 | 0.4307 | 2500 | 0.6861 | -0.0556 | -0.0712 | 0.5897 | 0.0157 | -70.3045 | -64.2686 | -3.0500 | -3.0557 |
0.6754 | 0.4480 | 2600 | 0.6856 | -0.0611 | -0.0780 | 0.5885 | 0.0169 | -70.9754 | -64.8212 | -3.0432 | -3.0489 |
0.6768 | 0.4652 | 2700 | 0.6851 | -0.0674 | -0.0855 | 0.5927 | 0.0181 | -71.7327 | -65.4567 | -3.0371 | -3.0427 |
0.6767 | 0.4824 | 2800 | 0.6846 | -0.0729 | -0.0920 | 0.5943 | 0.0192 | -72.3822 | -65.9983 | -3.0311 | -3.0368 |
0.677 | 0.4997 | 2900 | 0.6843 | -0.0755 | -0.0955 | 0.5997 | 0.0200 | -72.7311 | -66.2650 | -3.0233 | -3.0290 |
0.678 | 0.5169 | 3000 | 0.6838 | -0.0814 | -0.1025 | 0.6008 | 0.0211 | -73.4252 | -66.8486 | -3.0141 | -3.0198 |
0.67 | 0.5341 | 3100 | 0.6836 | -0.0822 | -0.1038 | 0.6018 | 0.0216 | -73.5633 | -66.9356 | -3.0096 | -3.0153 |
0.6718 | 0.5513 | 3200 | 0.6827 | -0.0939 | -0.1175 | 0.6034 | 0.0236 | -74.9309 | -68.1066 | -2.9982 | -3.0040 |
0.6724 | 0.5686 | 3300 | 0.6821 | -0.0998 | -0.1249 | 0.6041 | 0.0251 | -75.6721 | -68.6965 | -2.9850 | -2.9907 |
0.6625 | 0.5858 | 3400 | 0.6819 | -0.1010 | -0.1266 | 0.6066 | 0.0256 | -75.8434 | -68.8117 | -2.9759 | -2.9817 |
0.6743 | 0.6030 | 3500 | 0.6814 | -0.1069 | -0.1336 | 0.6113 | 0.0267 | -76.5408 | -69.4021 | -2.9688 | -2.9746 |
0.6721 | 0.6203 | 3600 | 0.6810 | -0.1127 | -0.1405 | 0.6078 | 0.0278 | -77.2252 | -69.9806 | -2.9599 | -2.9657 |
0.664 | 0.6375 | 3700 | 0.6804 | -0.1212 | -0.1504 | 0.6073 | 0.0292 | -78.2202 | -70.8276 | -2.9486 | -2.9544 |
0.6644 | 0.6547 | 3800 | 0.6795 | -0.1329 | -0.1643 | 0.6104 | 0.0313 | -79.6058 | -72.0042 | -2.9392 | -2.9450 |
0.6665 | 0.6720 | 3900 | 0.6787 | -0.1452 | -0.1785 | 0.6059 | 0.0333 | -81.0310 | -73.2281 | -2.9298 | -2.9357 |
0.6653 | 0.6892 | 4000 | 0.6781 | -0.1559 | -0.1908 | 0.6062 | 0.0349 | -82.2593 | -74.3019 | -2.9178 | -2.9236 |
0.6534 | 0.7064 | 4100 | 0.6777 | -0.1615 | -0.1973 | 0.6080 | 0.0359 | -82.9142 | -74.8574 | -2.9005 | -2.9063 |
0.6736 | 0.7236 | 4200 | 0.6769 | -0.1724 | -0.2103 | 0.6069 | 0.0379 | -84.2087 | -75.9475 | -2.8890 | -2.8949 |
0.6617 | 0.7409 | 4300 | 0.6764 | -0.1802 | -0.2194 | 0.6071 | 0.0392 | -85.1160 | -76.7326 | -2.8792 | -2.8851 |
0.6625 | 0.7581 | 4400 | 0.6756 | -0.1938 | -0.2351 | 0.6039 | 0.0413 | -86.6852 | -78.0909 | -2.8681 | -2.8740 |
0.6604 | 0.7753 | 4500 | 0.6746 | -0.2102 | -0.2541 | 0.6076 | 0.0439 | -88.5854 | -79.7309 | -2.8589 | -2.8650 |
0.6436 | 0.7926 | 4600 | 0.6736 | -0.2248 | -0.2712 | 0.6066 | 0.0463 | -90.2984 | -81.1957 | -2.8510 | -2.8569 |
0.6527 | 0.8098 | 4700 | 0.6728 | -0.2396 | -0.2882 | 0.6078 | 0.0486 | -92.0000 | -82.6740 | -2.8433 | -2.8492 |
0.6604 | 0.8270 | 4800 | 0.6721 | -0.2501 | -0.3005 | 0.6066 | 0.0504 | -93.2272 | -83.7222 | -2.8340 | -2.8399 |
0.6665 | 0.8442 | 4900 | 0.6713 | -0.2626 | -0.3152 | 0.6053 | 0.0526 | -94.6995 | -84.9707 | -2.8265 | -2.8324 |
0.65 | 0.8615 | 5000 | 0.6706 | -0.2707 | -0.3251 | 0.5936 | 0.0543 | -95.6856 | -85.7848 | -2.8110 | -2.8169 |
0.6625 | 0.8787 | 5100 | 0.6697 | -0.2838 | -0.3407 | 0.5941 | 0.0569 | -97.2505 | -87.0959 | -2.8023 | -2.8083 |
0.6511 | 0.8959 | 5200 | 0.6695 | -0.2869 | -0.3443 | 0.5983 | 0.0574 | -97.6072 | -87.3982 | -2.7964 | -2.8024 |
0.6473 | 0.9132 | 5300 | 0.6691 | -0.2904 | -0.3488 | 0.5992 | 0.0584 | -98.0594 | -87.7473 | -2.7880 | -2.7940 |
0.6492 | 0.9304 | 5400 | 0.6687 | -0.2941 | -0.3536 | 0.6004 | 0.0594 | -98.5365 | -88.1234 | -2.7825 | -2.7885 |
0.6409 | 0.9476 | 5500 | 0.6682 | -0.3026 | -0.3636 | 0.5978 | 0.0609 | -99.5376 | -88.9754 | -2.7736 | -2.7795 |
0.6531 | 0.9649 | 5600 | 0.6679 | -0.2997 | -0.3615 | 0.6006 | 0.0617 | -99.3275 | -88.6850 | -2.7683 | -2.7743 |
0.6523 | 0.9821 | 5700 | 0.6671 | -0.3127 | -0.3766 | 0.6018 | 0.0639 | -100.8429 | -89.9807 | -2.7604 | -2.7664 |
0.6355 | 0.9993 | 5800 | 0.6663 | -0.3277 | -0.3941 | 0.6078 | 0.0664 | -102.5891 | -91.4845 | -2.7485 | -2.7544 |
0.6363 | 1.0165 | 5900 | 0.6654 | -0.3506 | -0.4200 | 0.6013 | 0.0695 | -105.1840 | -93.7690 | -2.7327 | -2.7388 |
0.6587 | 1.0338 | 6000 | 0.6654 | -0.3455 | -0.4149 | 0.6090 | 0.0694 | -104.6700 | -93.2587 | -2.7256 | -2.7317 |
0.6335 | 1.0510 | 6100 | 0.6650 | -0.3500 | -0.4204 | 0.6085 | 0.0704 | -105.2201 | -93.7083 | -2.7173 | -2.7233 |
0.637 | 1.0682 | 6200 | 0.6641 | -0.3684 | -0.4416 | 0.6083 | 0.0731 | -107.3361 | -95.5533 | -2.7081 | -2.7143 |
0.6557 | 1.0855 | 6300 | 0.6634 | -0.3813 | -0.4567 | 0.6092 | 0.0754 | -108.8497 | -96.8372 | -2.7011 | -2.7073 |
0.6406 | 1.1027 | 6400 | 0.6629 | -0.3842 | -0.4611 | 0.6104 | 0.0769 | -109.2875 | -97.1323 | -2.6938 | -2.7001 |
0.6445 | 1.1199 | 6500 | 0.6627 | -0.3897 | -0.4671 | 0.6104 | 0.0774 | -109.8874 | -97.6783 | -2.6856 | -2.6919 |
0.6444 | 1.1371 | 6600 | 0.6626 | -0.3914 | -0.4693 | 0.6087 | 0.0779 | -110.1084 | -97.8481 | -2.6817 | -2.6880 |
0.6412 | 1.1544 | 6700 | 0.6621 | -0.3997 | -0.4794 | 0.6094 | 0.0796 | -111.1156 | -98.6842 | -2.6724 | -2.6787 |
0.6223 | 1.1716 | 6800 | 0.6614 | -0.4163 | -0.4982 | 0.6145 | 0.0819 | -113.0004 | -100.3420 | -2.6623 | -2.6687 |
0.6439 | 1.1888 | 6900 | 0.6612 | -0.4231 | -0.5061 | 0.6106 | 0.0830 | -113.7915 | -101.0268 | -2.6555 | -2.6619 |
0.6269 | 1.2061 | 7000 | 0.6606 | -0.4424 | -0.5279 | 0.6099 | 0.0855 | -115.9700 | -102.9478 | -2.6489 | -2.6553 |
0.6301 | 1.2233 | 7100 | 0.6603 | -0.4383 | -0.5243 | 0.6122 | 0.0860 | -115.6095 | -102.5456 | -2.6439 | -2.6503 |
0.625 | 1.2405 | 7200 | 0.6600 | -0.4436 | -0.5309 | 0.6129 | 0.0873 | -116.2657 | -103.0681 | -2.6385 | -2.6450 |
0.653 | 1.2578 | 7300 | 0.6599 | -0.4335 | -0.5204 | 0.6134 | 0.0868 | -115.2167 | -102.0655 | -2.6367 | -2.6430 |
0.6456 | 1.2750 | 7400 | 0.6600 | -0.4315 | -0.5182 | 0.6113 | 0.0866 | -114.9959 | -101.8630 | -2.6344 | -2.6409 |
0.6454 | 1.2922 | 7500 | 0.6597 | -0.4307 | -0.5182 | 0.6162 | 0.0875 | -114.9953 | -101.7817 | -2.6295 | -2.6359 |
0.6769 | 1.3094 | 7600 | 0.6593 | -0.4390 | -0.5278 | 0.6162 | 0.0888 | -115.9591 | -102.6077 | -2.6216 | -2.6281 |
0.6367 | 1.3267 | 7700 | 0.6591 | -0.4402 | -0.5295 | 0.6166 | 0.0893 | -116.1309 | -102.7307 | -2.6170 | -2.6235 |
0.621 | 1.3439 | 7800 | 0.6587 | -0.4486 | -0.5395 | 0.6190 | 0.0909 | -117.1267 | -103.5701 | -2.6111 | -2.6176 |
0.6413 | 1.3611 | 7900 | 0.6581 | -0.4553 | -0.5479 | 0.6201 | 0.0926 | -117.9684 | -104.2417 | -2.6072 | -2.6137 |
0.6228 | 1.3784 | 8000 | 0.6580 | -0.4586 | -0.5519 | 0.6217 | 0.0932 | -118.3658 | -104.5737 | -2.6039 | -2.6105 |
0.6409 | 1.3956 | 8100 | 0.6577 | -0.4652 | -0.5596 | 0.6213 | 0.0944 | -119.1380 | -105.2326 | -2.5999 | -2.6065 |
0.6504 | 1.4128 | 8200 | 0.6572 | -0.4709 | -0.5666 | 0.6166 | 0.0958 | -119.8450 | -105.8004 | -2.5972 | -2.6038 |
0.6468 | 1.4300 | 8300 | 0.6573 | -0.4657 | -0.5609 | 0.6231 | 0.0953 | -119.2726 | -105.2789 | -2.5953 | -2.6019 |
0.6301 | 1.4473 | 8400 | 0.6574 | -0.4609 | -0.5559 | 0.6211 | 0.0950 | -118.7683 | -104.8034 | -2.5927 | -2.5993 |
0.6207 | 1.4645 | 8500 | 0.6575 | -0.4578 | -0.5526 | 0.6187 | 0.0948 | -118.4422 | -104.4934 | -2.5884 | -2.5951 |
0.6661 | 1.4817 | 8600 | 0.6570 | -0.4650 | -0.5611 | 0.6206 | 0.0961 | -119.2866 | -105.2096 | -2.5845 | -2.5911 |
0.6475 | 1.4990 | 8700 | 0.6572 | -0.4575 | -0.5529 | 0.6197 | 0.0954 | -118.4655 | -104.4587 | -2.5841 | -2.5908 |
0.6478 | 1.5162 | 8800 | 0.6569 | -0.4607 | -0.5569 | 0.6199 | 0.0962 | -118.8732 | -104.7842 | -2.5812 | -2.5879 |
0.6338 | 1.5334 | 8900 | 0.6566 | -0.4694 | -0.5668 | 0.6201 | 0.0974 | -119.8600 | -105.6548 | -2.5766 | -2.5833 |
0.6283 | 1.5507 | 9000 | 0.6565 | -0.4721 | -0.5700 | 0.6199 | 0.0979 | -120.1781 | -105.9173 | -2.5752 | -2.5819 |
0.6462 | 1.5679 | 9100 | 0.6564 | -0.4728 | -0.5710 | 0.6187 | 0.0982 | -120.2769 | -105.9869 | -2.5728 | -2.5796 |
0.6228 | 1.5851 | 9200 | 0.6562 | -0.4767 | -0.5756 | 0.6194 | 0.0989 | -120.7382 | -106.3830 | -2.5720 | -2.5787 |
0.6224 | 1.6023 | 9300 | 0.6561 | -0.4771 | -0.5764 | 0.6197 | 0.0993 | -120.8189 | -106.4213 | -2.5689 | -2.5756 |
0.6286 | 1.6196 | 9400 | 0.6558 | -0.4825 | -0.5830 | 0.6211 | 0.1004 | -121.4753 | -106.9631 | -2.5668 | -2.5735 |
0.6221 | 1.6368 | 9500 | 0.6558 | -0.4833 | -0.5838 | 0.6199 | 0.1005 | -121.5581 | -107.0399 | -2.5650 | -2.5717 |
0.6358 | 1.6540 | 9600 | 0.6557 | -0.4891 | -0.5901 | 0.6194 | 0.1010 | -122.1902 | -107.6185 | -2.5614 | -2.5681 |
0.6358 | 1.6713 | 9700 | 0.6556 | -0.4886 | -0.5899 | 0.6206 | 0.1013 | -122.1670 | -107.5694 | -2.5605 | -2.5673 |
0.6243 | 1.6885 | 9800 | 0.6554 | -0.4898 | -0.5916 | 0.6211 | 0.1019 | -122.3449 | -107.6895 | -2.5598 | -2.5665 |
0.5825 | 1.7057 | 9900 | 0.6554 | -0.4917 | -0.5936 | 0.6211 | 0.1019 | -122.5433 | -107.8852 | -2.5589 | -2.5656 |
0.6181 | 1.7229 | 10000 | 0.6552 | -0.4927 | -0.5951 | 0.6208 | 0.1024 | -122.6864 | -107.9799 | -2.5578 | -2.5645 |
0.6364 | 1.7402 | 10100 | 0.6553 | -0.4917 | -0.5940 | 0.6201 | 0.1023 | -122.5787 | -107.8781 | -2.5562 | -2.5630 |
0.6272 | 1.7574 | 10200 | 0.6552 | -0.4947 | -0.5974 | 0.6206 | 0.1027 | -122.9187 | -108.1824 | -2.5552 | -2.5620 |
0.6576 | 1.7746 | 10300 | 0.6551 | -0.4968 | -0.5997 | 0.6204 | 0.1029 | -123.1503 | -108.3895 | -2.5543 | -2.5610 |
0.6036 | 1.7919 | 10400 | 0.6552 | -0.4950 | -0.5977 | 0.6187 | 0.1027 | -122.9548 | -108.2141 | -2.5535 | -2.5603 |
0.6174 | 1.8091 | 10500 | 0.6551 | -0.4961 | -0.5990 | 0.6194 | 0.1029 | -123.0769 | -108.3228 | -2.5536 | -2.5603 |
0.6403 | 1.8263 | 10600 | 0.6551 | -0.4962 | -0.5992 | 0.6197 | 0.1030 | -123.0967 | -108.3300 | -2.5527 | -2.5595 |
0.6341 | 1.8436 | 10700 | 0.6551 | -0.4973 | -0.6004 | 0.6185 | 0.1031 | -123.2222 | -108.4462 | -2.5520 | -2.5588 |
0.627 | 1.8608 | 10800 | 0.6549 | -0.4976 | -0.6011 | 0.6211 | 0.1035 | -123.2887 | -108.4688 | -2.5518 | -2.5586 |
0.6336 | 1.8780 | 10900 | 0.6549 | -0.4972 | -0.6009 | 0.6201 | 0.1037 | -123.2694 | -108.4345 | -2.5519 | -2.5587 |
0.626 | 1.8952 | 11000 | 0.6550 | -0.4983 | -0.6016 | 0.6206 | 0.1034 | -123.3421 | -108.5379 | -2.5516 | -2.5584 |
0.6314 | 1.9125 | 11100 | 0.6551 | -0.4974 | -0.6004 | 0.6194 | 0.1030 | -123.2212 | -108.4520 | -2.5517 | -2.5585 |
0.6239 | 1.9297 | 11200 | 0.6549 | -0.4976 | -0.6012 | 0.6192 | 0.1036 | -123.3044 | -108.4749 | -2.5519 | -2.5587 |
0.6632 | 1.9469 | 11300 | 0.6550 | -0.4977 | -0.6011 | 0.6194 | 0.1033 | -123.2879 | -108.4866 | -2.5514 | -2.5582 |
0.6306 | 1.9642 | 11400 | 0.6550 | -0.4978 | -0.6010 | 0.6183 | 0.1032 | -123.2786 | -108.4874 | -2.5514 | -2.5583 |
0.6532 | 1.9814 | 11500 | 0.6549 | -0.4977 | -0.6012 | 0.6206 | 0.1035 | -123.3012 | -108.4803 | -2.5513 | -2.5581 |
0.6257 | 1.9986 | 11600 | 0.6549 | -0.4976 | -0.6011 | 0.6194 | 0.1035 | -123.2918 | -108.4708 | -2.5511 | -2.5579 |
Framework versions
- Transformers 4.41.2
- Pytorch 2.1.2
- Datasets 2.19.2
- Tokenizers 0.19.1
- Downloads last month
- 13
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.