Edit model card

llama-7b-dpo-qlora-relu

This model is a fine-tuned version of meta-llama/Llama-2-7b-chat-hf on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6423
  • Rewards/chosen: 0.9449
  • Rewards/rejected: 0.6402
  • Rewards/accuracies: 0.6670
  • Rewards/margins: 0.3047
  • Logps/rejected: -2686.5962
  • Logps/chosen: -3150.4404
  • Logits/rejected: 0.2397
  • Logits/chosen: 0.1410

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 1
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 2
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 16
  • total_eval_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6918 0.01 20 0.6927 0.0105 0.0089 0.4990 0.0016 -2749.7332 -3243.8831 0.4756 0.4010
0.6888 0.01 40 0.6865 0.1092 0.0901 0.5570 0.0191 -2741.6096 -3234.0100 0.4667 0.3920
0.6812 0.02 60 0.6778 0.3441 0.2892 0.5580 0.0549 -2721.7036 -3210.5227 0.4384 0.3634
0.6845 0.02 80 0.6751 0.5007 0.4191 0.5530 0.0815 -2708.7063 -3194.8674 0.4217 0.3468
0.6855 0.03 100 0.6733 0.6956 0.5819 0.5500 0.1137 -2692.4290 -3175.3694 0.3896 0.3148
0.6642 0.03 120 0.6705 0.5230 0.4322 0.5710 0.0909 -2707.4033 -3192.6296 0.4159 0.3416
0.6701 0.04 140 0.6716 0.5848 0.4825 0.5710 0.1023 -2702.3718 -3186.4568 0.3973 0.3170
0.7142 0.04 160 0.6677 0.4415 0.3502 0.5850 0.0913 -2715.6021 -3200.7874 0.4151 0.3347
0.6615 0.05 180 0.6625 0.5577 0.4403 0.5990 0.1174 -2706.5872 -3189.1589 0.4109 0.3326
0.6665 0.05 200 0.6631 0.9369 0.7339 0.5860 0.2030 -2677.2251 -3151.2400 0.4161 0.3420
0.6708 0.06 220 0.6643 0.4623 0.3170 0.5920 0.1453 -2718.9246 -3198.7063 0.4936 0.4235
0.683 0.06 240 0.6630 0.5279 0.3786 0.6160 0.1493 -2712.7622 -3192.1443 0.4461 0.3650
0.6545 0.07 260 0.6642 0.7057 0.5381 0.6220 0.1676 -2696.8049 -3174.3584 0.4233 0.3391
0.6447 0.07 280 0.6697 0.9829 0.7689 0.6040 0.2140 -2673.7317 -3146.6445 0.3740 0.2969
0.6532 0.08 300 0.6842 1.0988 0.8235 0.6160 0.2752 -2668.2654 -3135.0552 0.3932 0.3240
0.6508 0.08 320 0.6766 0.4977 0.3186 0.6110 0.1791 -2718.7561 -3195.1597 0.4100 0.3256
0.6363 0.09 340 0.6838 0.6603 0.4982 0.5950 0.1621 -2700.7981 -3178.8992 0.3598 0.2745
0.7016 0.09 360 0.6749 1.1088 0.8535 0.6150 0.2553 -2665.2732 -3134.0569 0.3153 0.2233
0.6508 0.1 380 0.6655 0.8342 0.5982 0.6170 0.2360 -2690.8040 -3161.5134 0.3631 0.2783
0.7066 0.1 400 0.6643 0.4586 0.3004 0.6090 0.1582 -2720.5776 -3199.0710 0.3913 0.3081
0.6569 0.11 420 0.6895 1.7461 1.4097 0.5970 0.3364 -2609.6536 -3070.3232 0.1995 0.0996
0.6971 0.12 440 0.6804 0.3106 0.1542 0.5970 0.1564 -2735.1970 -3213.8723 0.4654 0.3795
0.7179 0.12 460 0.6708 0.3908 0.2148 0.6060 0.1760 -2729.1362 -3205.8511 0.4534 0.3755
0.6713 0.13 480 0.6653 1.1610 0.9019 0.6100 0.2591 -2660.4290 -3128.8379 0.2921 0.2072
0.7025 0.13 500 0.6618 0.8239 0.6190 0.6230 0.2048 -2688.7156 -3162.5444 0.3598 0.2752
0.6805 0.14 520 0.6632 1.1599 0.9100 0.6100 0.2499 -2659.6174 -3128.9429 0.3036 0.2199
0.6669 0.14 540 0.6762 0.4281 0.2712 0.6010 0.1569 -2723.4954 -3202.1235 0.3960 0.3262
0.7231 0.15 560 0.6819 1.6382 1.2978 0.6100 0.3405 -2620.8401 -3081.1089 0.2061 0.1212
0.6914 0.15 580 0.6667 0.7317 0.5114 0.6120 0.2203 -2699.4773 -3171.7651 0.3602 0.2804
0.6744 0.16 600 0.6655 1.3122 1.0204 0.6140 0.2917 -2648.5754 -3113.7166 0.2893 0.2001
0.7202 0.16 620 0.6704 1.3732 1.0696 0.6190 0.3035 -2643.6584 -3107.6179 0.3039 0.2156
0.6505 0.17 640 0.6631 1.0842 0.8426 0.6320 0.2416 -2666.3557 -3136.5125 0.2946 0.2053
0.6678 0.17 660 0.6688 0.7100 0.5343 0.6170 0.1758 -2697.1909 -3173.9294 0.3019 0.2101
0.6905 0.18 680 0.6601 1.1264 0.8772 0.6300 0.2492 -2662.8979 -3132.2937 0.2674 0.1735
0.6414 0.18 700 0.6684 0.7719 0.5596 0.6280 0.2123 -2694.6565 -3167.7427 0.3401 0.2509
0.6752 0.19 720 0.6932 1.8703 1.4853 0.6140 0.3850 -2602.0854 -3057.8987 0.1787 0.0785
0.6982 0.19 740 0.6774 0.4667 0.2947 0.6160 0.1720 -2721.1499 -3198.2676 0.3575 0.2655
0.6149 0.2 760 0.6715 1.5227 1.1845 0.6310 0.3383 -2632.1743 -3092.6604 0.2016 0.1056
0.6568 0.2 780 0.6975 0.1888 -0.0062 0.6020 0.1951 -2751.2429 -3226.0491 0.4294 0.3384
0.633 0.21 800 0.6989 2.0748 1.6194 0.6130 0.4554 -2588.6804 -3037.4561 0.1522 0.0436
0.6907 0.21 820 0.6632 1.0945 0.8066 0.6350 0.2879 -2669.9553 -3135.4792 0.3036 0.2037
0.6582 0.22 840 0.6571 0.8583 0.6168 0.6260 0.2416 -2688.9436 -3159.1021 0.3119 0.2173
0.6568 0.23 860 0.6718 0.4558 0.2827 0.6090 0.1732 -2722.3523 -3199.3511 0.3512 0.2592
0.6589 0.23 880 0.6679 1.3269 1.0100 0.625 0.3169 -2649.6179 -3112.2434 0.2110 0.1108
0.6371 0.24 900 0.6656 1.1832 0.8731 0.6300 0.3101 -2663.3120 -3126.6121 0.2307 0.1377
0.7471 0.24 920 0.6693 0.8367 0.5850 0.6390 0.2517 -2692.1221 -3161.2661 0.2916 0.2077
0.6415 0.25 940 0.6632 1.0762 0.8094 0.6370 0.2669 -2669.6843 -3137.3086 0.2347 0.1441
0.7267 0.25 960 0.6971 2.0368 1.6586 0.5930 0.3781 -2584.7571 -3041.2559 0.0743 -0.0256
0.6586 0.26 980 0.6856 0.3772 0.2421 0.6090 0.1351 -2726.4094 -3207.2104 0.3268 0.2380
0.7058 0.26 1000 0.6665 1.0340 0.7988 0.6310 0.2352 -2670.7419 -3141.5334 0.2264 0.1320
0.6562 0.27 1020 0.6731 0.4362 0.2631 0.6220 0.1731 -2724.3091 -3201.3096 0.3141 0.2192
0.6695 0.27 1040 0.6666 0.9000 0.6468 0.6240 0.2532 -2685.9409 -3154.9338 0.2496 0.1522
0.6998 0.28 1060 0.6631 0.9608 0.7039 0.6270 0.2569 -2680.2302 -3148.8518 0.2293 0.1286
0.6467 0.28 1080 0.6611 0.9271 0.6794 0.6310 0.2477 -2682.6790 -3152.2249 0.2534 0.1543
0.7014 0.29 1100 0.6916 0.1793 0.0194 0.5970 0.1599 -2748.6746 -3227.0022 0.4020 0.3112
0.6383 0.29 1120 0.6646 1.2449 0.9461 0.6190 0.2988 -2656.0103 -3120.4397 0.2246 0.1310
0.6594 0.3 1140 0.6694 1.2174 0.9267 0.625 0.2907 -2657.9519 -3123.1938 0.2294 0.1372
0.6662 0.3 1160 0.6692 0.7808 0.5201 0.6340 0.2606 -2698.6074 -3166.8572 0.3542 0.2664
0.6439 0.31 1180 0.6644 0.9192 0.6222 0.6410 0.2970 -2688.3950 -3153.0110 0.3655 0.2800
0.6218 0.31 1200 0.6586 1.0825 0.7651 0.6430 0.3175 -2674.1140 -3136.6797 0.3050 0.2116
0.68 0.32 1220 0.6571 0.9931 0.6987 0.6560 0.2944 -2680.7493 -3145.6201 0.3002 0.2058
0.631 0.32 1240 0.6606 1.4409 1.0899 0.6450 0.3511 -2641.6331 -3100.8398 0.2226 0.1298
0.6553 0.33 1260 0.6755 1.3941 1.0416 0.6360 0.3525 -2646.4556 -3105.5215 0.1853 0.0877
0.656 0.33 1280 0.6742 1.6210 1.2561 0.6470 0.3649 -2625.0129 -3082.8352 0.1333 0.0343
0.6968 0.34 1300 0.6620 1.5566 1.2255 0.6370 0.3311 -2628.0706 -3089.2764 0.1418 0.0440
0.6756 0.35 1320 0.6619 1.4656 1.1785 0.6260 0.2871 -2632.7727 -3098.3765 0.1436 0.0456
0.651 0.35 1340 0.6586 0.9936 0.7542 0.6330 0.2394 -2675.2009 -3145.5730 0.2575 0.1608
0.6863 0.36 1360 0.6593 1.0603 0.7861 0.6410 0.2742 -2672.0063 -3138.9028 0.2625 0.1624
0.6671 0.36 1380 0.6585 0.9249 0.6679 0.6300 0.2570 -2683.8271 -3152.4412 0.2769 0.1792
0.6495 0.37 1400 0.6559 1.0075 0.7487 0.6410 0.2589 -2675.7534 -3144.1819 0.2563 0.1592
0.6505 0.37 1420 0.6666 0.5015 0.3152 0.6310 0.1862 -2719.0969 -3194.7869 0.3321 0.2419
0.6855 0.38 1440 0.6567 0.8450 0.6100 0.6470 0.2350 -2689.6213 -3160.4331 0.2770 0.1859
0.6501 0.38 1460 0.6599 0.7577 0.5266 0.6390 0.2311 -2697.9607 -3169.1663 0.2910 0.1981
0.649 0.39 1480 0.6599 1.2617 0.9540 0.6420 0.3077 -2655.2158 -3118.7607 0.2065 0.1052
0.6554 0.39 1500 0.6583 1.0495 0.7839 0.6490 0.2656 -2672.2302 -3139.9814 0.2280 0.1330
0.6749 0.4 1520 0.6606 0.8217 0.5860 0.6320 0.2356 -2692.0178 -3162.7683 0.2671 0.1767
0.6857 0.4 1540 0.6595 0.7859 0.5242 0.6460 0.2617 -2698.1951 -3166.3406 0.3070 0.2132
0.6507 0.41 1560 0.6542 0.9973 0.6889 0.6470 0.3084 -2681.7246 -3145.1982 0.2675 0.1687
0.6126 0.41 1580 0.6575 1.2987 0.9358 0.6440 0.3629 -2657.0410 -3115.0645 0.2162 0.1168
0.6109 0.42 1600 0.6630 1.4768 1.0912 0.6350 0.3857 -2641.5007 -3097.2493 0.1774 0.0735
0.6221 0.42 1620 0.6609 1.2858 0.9370 0.6470 0.3488 -2656.9226 -3116.3562 0.1969 0.0922
0.6565 0.43 1640 0.6651 0.7151 0.4459 0.6400 0.2692 -2706.0293 -3173.4238 0.2898 0.1894
0.5982 0.43 1660 0.6571 1.4690 1.0905 0.6410 0.3785 -2641.5686 -3098.0374 0.1833 0.0805
0.6986 0.44 1680 0.6550 1.1146 0.7781 0.6480 0.3365 -2672.8064 -3133.4736 0.2533 0.1546
0.6316 0.44 1700 0.6606 1.6375 1.2530 0.6360 0.3845 -2625.3179 -3081.1812 0.1494 0.0475
0.6618 0.45 1720 0.6571 1.0847 0.7877 0.6440 0.2969 -2671.8479 -3136.4675 0.2297 0.1309
0.7146 0.46 1740 0.6609 1.4069 1.0677 0.6420 0.3392 -2643.8464 -3104.2388 0.1950 0.0944
0.7156 0.46 1760 0.6546 1.0781 0.7864 0.6530 0.2917 -2671.9775 -3137.1184 0.2555 0.1579
0.6817 0.47 1780 0.6729 0.5426 0.3537 0.6190 0.1888 -2715.2463 -3190.6765 0.3162 0.2207
0.6277 0.47 1800 0.6605 1.4863 1.1568 0.6330 0.3295 -2634.9365 -3096.2996 0.1666 0.0620
0.6093 0.48 1820 0.6556 1.3461 1.0113 0.6490 0.3348 -2649.4885 -3110.3245 0.2064 0.1022
0.6416 0.48 1840 0.6525 1.0218 0.7311 0.6510 0.2908 -2677.5134 -3142.7522 0.2618 0.1602
0.647 0.49 1860 0.6554 1.3002 0.9643 0.6440 0.3360 -2654.1936 -3114.9124 0.2039 0.1007
0.6269 0.49 1880 0.6585 0.7954 0.5231 0.6350 0.2724 -2698.3127 -3165.3899 0.2689 0.1661
0.7114 0.5 1900 0.6589 0.6154 0.3766 0.6370 0.2388 -2712.9587 -3183.3887 0.2911 0.1904
0.6789 0.5 1920 0.6563 0.7003 0.4604 0.6350 0.2400 -2704.5811 -3174.8984 0.2714 0.1702
0.6729 0.51 1940 0.6574 1.2669 0.9475 0.6420 0.3194 -2655.8650 -3118.2434 0.1795 0.0734
0.6502 0.51 1960 0.6607 1.4160 1.0771 0.6400 0.3390 -2642.9128 -3103.3286 0.1572 0.0508
0.6567 0.52 1980 0.6547 0.9924 0.7209 0.6440 0.2715 -2678.5286 -3145.6885 0.2263 0.1233
0.66 0.52 2000 0.6564 0.9395 0.6803 0.6410 0.2592 -2682.5881 -3150.9863 0.2323 0.1301
0.6165 0.53 2020 0.6539 1.1203 0.8204 0.6420 0.2999 -2668.5769 -3132.9045 0.2117 0.1094
0.7214 0.53 2040 0.6555 1.3331 0.9914 0.6430 0.3418 -2651.4824 -3111.6213 0.1934 0.0901
0.6622 0.54 2060 0.6509 1.2432 0.9268 0.6400 0.3164 -2657.9395 -3120.6147 0.1900 0.0865
0.6141 0.54 2080 0.6504 1.1034 0.8115 0.6370 0.2919 -2669.4675 -3134.5964 0.2067 0.1041
0.6511 0.55 2100 0.6495 1.3362 1.0167 0.6470 0.3195 -2648.9509 -3111.3123 0.1578 0.0529
0.6457 0.55 2120 0.6507 1.4016 1.0814 0.6300 0.3202 -2642.4827 -3104.7749 0.1297 0.0242
0.6444 0.56 2140 0.6481 0.9908 0.7249 0.6460 0.2659 -2678.1279 -3145.8511 0.1869 0.0838
0.6709 0.57 2160 0.6469 1.1710 0.8782 0.6470 0.2928 -2662.7959 -3127.8286 0.1521 0.0463
0.7217 0.57 2180 0.6496 0.8703 0.6234 0.6410 0.2469 -2688.2808 -3157.9065 0.1928 0.0898
0.7032 0.58 2200 0.6462 1.2924 0.9830 0.6350 0.3094 -2652.3159 -3115.6887 0.1211 0.0142
0.729 0.58 2220 0.6603 1.7124 1.3448 0.6340 0.3676 -2616.1379 -3073.6912 0.0609 -0.0472
0.6496 0.59 2240 0.6475 1.2981 0.9806 0.6440 0.3175 -2652.5581 -3115.1221 0.1405 0.0349
0.6615 0.59 2260 0.6476 1.3386 1.0066 0.6450 0.3320 -2649.9587 -3111.0693 0.1516 0.0464
0.6581 0.6 2280 0.6458 1.0039 0.7166 0.6520 0.2873 -2678.9626 -3144.5474 0.2101 0.1083
0.6604 0.6 2300 0.6468 0.9760 0.6927 0.6510 0.2833 -2681.3484 -3147.3301 0.2123 0.1119
0.6762 0.61 2320 0.6451 1.2231 0.9037 0.6540 0.3194 -2660.2520 -3122.6216 0.1764 0.0751
0.6687 0.61 2340 0.6448 1.0471 0.7491 0.6470 0.2980 -2675.7124 -3140.2263 0.2063 0.1060
0.6154 0.62 2360 0.6460 1.3661 1.0244 0.6510 0.3417 -2648.1831 -3108.3257 0.1519 0.0509
0.712 0.62 2380 0.6491 1.4910 1.1296 0.6490 0.3613 -2637.6560 -3095.8364 0.1400 0.0397
0.675 0.63 2400 0.6467 0.8895 0.6147 0.6510 0.2748 -2689.1521 -3155.9834 0.2318 0.1331
0.6251 0.63 2420 0.6458 0.9209 0.6407 0.6540 0.2802 -2686.5471 -3152.8416 0.2377 0.1404
0.58 0.64 2440 0.6451 1.0306 0.7363 0.6470 0.2943 -2676.9885 -3141.8696 0.2140 0.1162
0.6538 0.64 2460 0.6477 1.5124 1.1437 0.6430 0.3688 -2636.2539 -3093.6924 0.1432 0.0436
0.6741 0.65 2480 0.6436 1.2072 0.8802 0.6460 0.3270 -2662.5972 -3124.2100 0.1846 0.0856
0.6109 0.65 2500 0.6447 1.1234 0.8068 0.6500 0.3166 -2669.9399 -3132.5901 0.1969 0.0982
0.6749 0.66 2520 0.6447 1.1995 0.8679 0.6470 0.3316 -2663.8308 -3124.9836 0.1925 0.0972
0.6524 0.66 2540 0.6449 1.1229 0.8033 0.6430 0.3196 -2670.2866 -3132.6394 0.2064 0.1134
0.6155 0.67 2560 0.6445 1.2928 0.9541 0.6490 0.3388 -2655.2100 -3115.6487 0.1766 0.0820
0.6498 0.68 2580 0.6460 1.4062 1.0492 0.6530 0.3570 -2645.6958 -3104.3142 0.1537 0.0579
0.6205 0.68 2600 0.6453 1.4175 1.0608 0.6500 0.3567 -2644.5352 -3103.1826 0.1426 0.0455
0.6644 0.69 2620 0.6438 1.2662 0.9337 0.6520 0.3326 -2657.2537 -3118.3110 0.1690 0.0736
0.6403 0.69 2640 0.6467 0.8363 0.5591 0.6530 0.2772 -2694.7085 -3161.3059 0.2464 0.1527
0.6697 0.7 2660 0.6505 0.7270 0.4575 0.6480 0.2696 -2704.8721 -3172.2310 0.2698 0.1761
0.586 0.7 2680 0.6468 0.9120 0.6146 0.6530 0.2973 -2689.1567 -3153.7361 0.2405 0.1441
0.7133 0.71 2700 0.6477 0.9017 0.6019 0.6550 0.2997 -2690.4275 -3154.7668 0.2465 0.1499
0.6203 0.71 2720 0.6453 1.1435 0.8143 0.6560 0.3292 -2669.1887 -3130.5833 0.2043 0.1065
0.6403 0.72 2740 0.6447 1.1619 0.8317 0.6600 0.3302 -2667.4482 -3128.7390 0.1988 0.1014
0.6562 0.72 2760 0.6440 1.2726 0.9351 0.6550 0.3374 -2657.1047 -3117.6772 0.1750 0.0771
0.6216 0.73 2780 0.6433 1.1472 0.8271 0.6570 0.3201 -2667.9097 -3130.2151 0.1984 0.1010
0.6439 0.73 2800 0.6434 1.1500 0.8274 0.6630 0.3226 -2667.8799 -3129.9346 0.2031 0.1054
0.6545 0.74 2820 0.6444 1.2737 0.9325 0.6570 0.3412 -2657.3660 -3117.5645 0.1840 0.0854
0.5712 0.74 2840 0.6442 1.3124 0.9665 0.6590 0.3459 -2653.9678 -3113.6951 0.1738 0.0740
0.6623 0.75 2860 0.6435 1.2882 0.9459 0.6590 0.3424 -2656.0342 -3116.1118 0.1759 0.0758
0.6491 0.75 2880 0.6429 1.0676 0.7540 0.6630 0.3136 -2675.2224 -3138.1736 0.2086 0.1087
0.6316 0.76 2900 0.6444 0.9143 0.6184 0.6560 0.2959 -2688.7827 -3153.5068 0.2324 0.1333
0.6851 0.76 2920 0.6433 0.9858 0.6757 0.6550 0.3102 -2683.0530 -3146.3491 0.2232 0.1233
0.6261 0.77 2940 0.6436 1.0911 0.7674 0.6610 0.3237 -2673.8782 -3135.8259 0.2103 0.1108
0.591 0.77 2960 0.6434 1.0843 0.7597 0.6610 0.3246 -2674.6450 -3136.4993 0.2118 0.1125
0.6719 0.78 2980 0.6440 1.0943 0.7677 0.6630 0.3266 -2673.8528 -3135.5054 0.2090 0.1100
0.6609 0.79 3000 0.6442 1.0791 0.7548 0.6630 0.3243 -2675.1423 -3137.0229 0.2128 0.1147
0.6365 0.79 3020 0.6446 1.1918 0.8544 0.6620 0.3374 -2665.1812 -3125.7544 0.1954 0.0969
0.6146 0.8 3040 0.6441 1.1548 0.8233 0.6600 0.3315 -2668.2886 -3129.4490 0.2033 0.1046
0.6289 0.8 3060 0.6435 1.0469 0.7296 0.6610 0.3172 -2677.6558 -3140.2471 0.2190 0.1207
0.6233 0.81 3080 0.6443 0.9655 0.6584 0.6570 0.3072 -2684.7822 -3148.3809 0.2312 0.1331
0.5942 0.81 3100 0.6441 1.0521 0.7311 0.6620 0.3210 -2677.5120 -3139.7278 0.2208 0.1215
0.6646 0.82 3120 0.6439 1.0663 0.7436 0.6590 0.3226 -2676.2566 -3138.3083 0.2200 0.1207
0.7201 0.82 3140 0.6431 1.0673 0.7465 0.6630 0.3208 -2675.9697 -3138.2017 0.2170 0.1173
0.684 0.83 3160 0.6429 1.0782 0.7570 0.6630 0.3213 -2674.9221 -3137.1096 0.2138 0.1138
0.6372 0.83 3180 0.6424 1.0512 0.7307 0.6610 0.3205 -2677.5535 -3139.8164 0.2199 0.1195
0.6491 0.84 3200 0.6429 0.9864 0.6737 0.6610 0.3127 -2683.2532 -3146.2932 0.2311 0.1313
0.6321 0.84 3220 0.6419 1.0593 0.7374 0.6640 0.3218 -2676.8789 -3139.0081 0.2190 0.1184
0.6858 0.85 3240 0.6418 1.1185 0.7905 0.6670 0.3281 -2671.5710 -3133.0784 0.2093 0.1081
0.6487 0.85 3260 0.6414 1.1003 0.7762 0.6670 0.3241 -2673.0029 -3134.9077 0.2102 0.1092
0.6232 0.86 3280 0.6418 1.0890 0.7641 0.6650 0.3249 -2674.2104 -3136.0315 0.2155 0.1153
0.6751 0.86 3300 0.6423 1.1216 0.7925 0.6690 0.3291 -2671.3660 -3132.7705 0.2116 0.1113
0.6696 0.87 3320 0.6420 1.1138 0.7855 0.6650 0.3283 -2672.0674 -3133.5513 0.2124 0.1127
0.6762 0.87 3340 0.6418 1.0429 0.7242 0.6670 0.3187 -2678.2026 -3140.6455 0.2234 0.1238
0.6431 0.88 3360 0.6423 0.9878 0.6777 0.6680 0.3100 -2682.8467 -3146.1572 0.2324 0.1334
0.6533 0.88 3380 0.6422 0.9657 0.6575 0.6670 0.3082 -2684.8696 -3148.3625 0.2357 0.1369
0.6517 0.89 3400 0.6415 1.0024 0.6893 0.6660 0.3132 -2681.6929 -3144.6909 0.2319 0.1329
0.7125 0.9 3420 0.6420 0.9890 0.6795 0.6700 0.3095 -2682.6711 -3146.0359 0.2327 0.1341
0.655 0.9 3440 0.6418 0.9841 0.6752 0.6670 0.3089 -2683.0972 -3146.5217 0.2339 0.1353
0.6298 0.91 3460 0.6421 0.9683 0.6617 0.6670 0.3066 -2684.4517 -3148.1047 0.2362 0.1376
0.634 0.91 3480 0.6420 0.9671 0.6600 0.6640 0.3071 -2684.6169 -3148.2190 0.2363 0.1376
0.6325 0.92 3500 0.6422 0.9461 0.6408 0.6670 0.3053 -2686.5374 -3150.3208 0.2398 0.1410
0.6207 0.92 3520 0.6423 0.9349 0.6315 0.6670 0.3034 -2687.4702 -3151.4434 0.2420 0.1432
0.6435 0.93 3540 0.6423 0.9279 0.6254 0.6630 0.3025 -2688.0842 -3152.1453 0.2425 0.1440
0.6271 0.93 3560 0.6428 0.9143 0.6145 0.6670 0.2998 -2689.1689 -3153.5029 0.2442 0.1455
0.6405 0.94 3580 0.6426 0.9048 0.6055 0.6670 0.2994 -2690.0718 -3154.4497 0.2447 0.1459
0.6822 0.94 3600 0.6424 0.9191 0.6187 0.6610 0.3005 -2688.7505 -3153.0198 0.2428 0.1443
0.6431 0.95 3620 0.6423 0.9294 0.6263 0.6670 0.3031 -2687.9922 -3151.9922 0.2417 0.1429
0.6189 0.95 3640 0.6424 0.9340 0.6305 0.6690 0.3034 -2687.5674 -3151.5378 0.2410 0.1422
0.6516 0.96 3660 0.6424 0.9430 0.6385 0.6700 0.3045 -2686.7739 -3150.6345 0.2398 0.1409
0.6229 0.96 3680 0.6422 0.9399 0.6361 0.6680 0.3038 -2687.0042 -3150.9431 0.2402 0.1416
0.6209 0.97 3700 0.6424 0.9390 0.6353 0.6690 0.3037 -2687.0925 -3151.0369 0.2406 0.1419
0.5807 0.97 3720 0.6425 0.9358 0.6323 0.6700 0.3034 -2687.3884 -3151.3577 0.2408 0.1421
0.6304 0.98 3740 0.6423 0.9440 0.6394 0.6670 0.3047 -2686.6794 -3150.5283 0.2406 0.1419
0.6049 0.98 3760 0.6424 0.9451 0.6405 0.6660 0.3046 -2686.5706 -3150.4238 0.2391 0.1403
0.6624 0.99 3780 0.6424 0.9449 0.6407 0.6640 0.3042 -2686.5491 -3150.4412 0.2395 0.1407
0.6649 0.99 3800 0.6423 0.9422 0.6378 0.6660 0.3044 -2686.8362 -3150.7134 0.2403 0.1415
0.638 1.0 3820 0.6423 0.9449 0.6403 0.6670 0.3047 -2686.5935 -3150.4404 0.2397 0.1410

Framework versions

  • PEFT 0.7.1
  • Transformers 4.36.2
  • Pytorch 2.2.1+cu121
  • Datasets 2.14.6
  • Tokenizers 0.15.2
Downloads last month
8
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for chanchan7/llama-7b-dpo-qlora-relu

Adapter
(1050)
this model

Dataset used to train chanchan7/llama-7b-dpo-qlora-relu