Edit model card

PE_Llama_2_7b_sft_rlhf

This model was trained from scratch on the arrow dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0093
  • Rewards/chosen: -7.0331
  • Rewards/rejected: -29.3861
  • Rewards/accuracies: 0.9916
  • Rewards/margins: 22.3530
  • Logps/rejected: -118.6765
  • Logps/chosen: -90.0482
  • Logits/rejected: -1.3495
  • Logits/chosen: -1.4301

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 3e-07
  • train_batch_size: 1
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 64
  • total_eval_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.5577 0.05 100 0.5743 -0.0890 -0.3528 0.9022 0.2638 -60.6098 -76.1599 -1.3076 -1.3716
0.1502 0.09 200 0.1761 -0.5864 -2.4951 0.9804 1.9086 -64.8944 -77.1548 -1.3397 -1.4091
0.0367 0.14 300 0.0640 -1.1815 -4.8466 0.9860 3.6651 -69.5975 -78.3450 -1.3685 -1.4428
0.0195 0.18 400 0.0419 -1.6306 -6.4153 0.9832 4.7847 -72.7348 -79.2431 -1.3875 -1.4648
0.0128 0.23 500 0.0321 -2.1351 -8.0395 0.9860 5.9044 -75.9833 -80.2522 -1.4045 -1.4847
0.0078 0.27 600 0.0294 -2.8235 -9.6992 0.9860 6.8757 -79.3027 -81.6291 -1.4163 -1.4986
0.0074 0.32 700 0.0177 -2.7718 -10.7772 0.9832 8.0054 -81.4587 -81.5256 -1.4251 -1.5079
0.0051 0.37 800 0.0144 -2.4805 -11.3179 0.9832 8.8374 -82.5400 -80.9429 -1.4353 -1.5181
0.003 0.41 900 0.0160 -2.8352 -12.2817 0.9860 9.4465 -84.4677 -81.6525 -1.4421 -1.5261
0.0031 0.46 1000 0.0122 -2.8873 -13.0359 0.9860 10.1487 -85.9761 -81.7565 -1.4514 -1.5345
0.0107 0.5 1100 0.0110 -2.8383 -13.0784 0.9888 10.2401 -86.0611 -81.6586 -1.4506 -1.5334
0.0065 0.55 1200 0.0130 -3.3682 -13.9857 0.9860 10.6176 -87.8757 -82.7184 -1.4603 -1.5441
0.0054 0.59 1300 0.0123 -3.6048 -14.8999 0.9888 11.2951 -89.7041 -83.1916 -1.4576 -1.5403
0.0048 0.64 1400 0.0091 -3.3176 -15.0505 0.9860 11.7329 -90.0053 -82.6172 -1.4598 -1.5418
0.0017 0.68 1500 0.0087 -3.3081 -15.5642 0.9860 12.2561 -91.0327 -82.5982 -1.4671 -1.5494
0.0042 0.73 1600 0.0091 -3.5315 -16.2814 0.9860 12.7498 -92.4670 -83.0451 -1.4722 -1.5560
0.0035 0.78 1700 0.0078 -3.1483 -15.9040 0.9916 12.7557 -91.7122 -82.2786 -1.4664 -1.5481
0.0094 0.82 1800 0.0071 -2.9923 -15.9175 0.9888 12.9251 -91.7391 -81.9667 -1.4572 -1.5390
0.0024 0.87 1900 0.0066 -2.9861 -16.5288 0.9916 13.5427 -92.9619 -81.9542 -1.4690 -1.5511
0.0067 0.91 2000 0.0076 -3.2851 -16.0301 0.9916 12.7450 -91.9644 -82.5522 -1.4577 -1.5391
0.0044 0.96 2100 0.0064 -3.3414 -16.8752 0.9944 13.5338 -93.6545 -82.6647 -1.4617 -1.5440
0.0025 1.0 2200 0.0060 -3.1967 -16.8252 0.9944 13.6285 -93.5546 -82.3753 -1.4630 -1.5444
0.0023 1.05 2300 0.0063 -3.5595 -17.6105 0.9916 14.0510 -95.1253 -83.1011 -1.4645 -1.5467
0.0055 1.1 2400 0.0070 -4.0460 -18.6662 0.9944 14.6201 -97.2365 -84.0740 -1.4606 -1.5441
0.0052 1.14 2500 0.0067 -3.3185 -17.6030 0.9944 14.2844 -95.1102 -82.6191 -1.4679 -1.5507
0.0023 1.19 2600 0.0064 -3.4071 -18.2406 0.9944 14.8335 -96.3854 -82.7962 -1.4667 -1.5501
0.0044 1.23 2700 0.0090 -4.3343 -19.6985 0.9916 15.3642 -99.3012 -84.6506 -1.4647 -1.5496
0.0033 1.28 2800 0.0113 -4.6406 -19.7381 0.9916 15.0976 -99.3805 -85.2631 -1.4569 -1.5408
0.0023 1.32 2900 0.0070 -3.9341 -19.4138 0.9944 15.4797 -98.7318 -83.8501 -1.4612 -1.5449
0.0034 1.37 3000 0.0066 -3.7082 -18.5209 0.9916 14.8127 -96.9460 -83.3983 -1.4587 -1.5399
0.0033 1.42 3100 0.0064 -3.6694 -18.6338 0.9972 14.9644 -97.1717 -83.3208 -1.4480 -1.5297
0.0034 1.46 3200 0.0059 -3.7376 -19.1673 0.9944 15.4298 -98.2389 -83.4571 -1.4483 -1.5307
0.0019 1.51 3300 0.0061 -3.9735 -19.7068 0.9916 15.7332 -99.3178 -83.9291 -1.4459 -1.5285
0.0011 1.55 3400 0.0066 -4.3242 -20.4806 0.9944 16.1564 -100.8654 -84.6304 -1.4412 -1.5245
0.0001 1.6 3500 0.0093 -4.7847 -21.0204 0.9916 16.2357 -101.9450 -85.5513 -1.4308 -1.5145
0.0037 1.64 3600 0.0076 -4.5704 -20.9595 0.9888 16.3891 -101.8232 -85.1228 -1.4373 -1.5209
0.003 1.69 3700 0.0087 -4.7965 -21.6522 0.9916 16.8557 -103.2086 -85.5750 -1.4300 -1.5148
0.0056 1.73 3800 0.0093 -5.1262 -22.2592 0.9916 17.1330 -104.4226 -86.2344 -1.4213 -1.5058
0.0024 1.78 3900 0.0113 -5.8601 -23.7638 0.9888 17.9037 -107.4319 -87.7022 -1.4014 -1.4856
0.0034 1.83 4000 0.0056 -4.7077 -22.5264 0.9944 17.8187 -104.9570 -85.3974 -1.4252 -1.5084
0.0044 1.87 4100 0.0055 -4.2834 -21.6926 0.9972 17.4092 -103.2894 -84.5488 -1.4342 -1.5165
0.0001 1.92 4200 0.0068 -5.2542 -23.4097 0.9916 18.1555 -106.7237 -86.4905 -1.4219 -1.5052
0.0044 1.96 4300 0.0075 -5.2492 -23.2824 0.9888 18.0332 -106.4690 -86.4804 -1.4098 -1.4921
0.0022 2.01 4400 0.0082 -5.6200 -23.9342 0.9944 18.3142 -107.7725 -87.2220 -1.4087 -1.4906
0.0033 2.05 4500 0.0091 -5.9484 -24.5607 0.9916 18.6123 -109.0256 -87.8787 -1.4036 -1.4857
0.0022 2.1 4600 0.0091 -6.0570 -25.0424 0.9916 18.9853 -109.9890 -88.0961 -1.3980 -1.4804
0.0011 2.15 4700 0.0100 -6.3832 -25.6097 0.9888 19.2265 -111.1236 -88.7484 -1.3907 -1.4732
0.0065 2.19 4800 0.0073 -5.7898 -25.1360 0.9916 19.3462 -110.1763 -87.5616 -1.4006 -1.4827
0.0022 2.24 4900 0.0091 -6.1379 -25.9334 0.9916 19.7955 -111.7710 -88.2578 -1.3907 -1.4732
0.0022 2.28 5000 0.0147 -7.3728 -27.6080 0.9888 20.2352 -115.1203 -90.7277 -1.3738 -1.4564
0.0033 2.33 5100 0.0120 -6.9056 -27.3057 0.9888 20.4002 -114.5157 -89.7931 -1.3780 -1.4604
0.0043 2.37 5200 0.0097 -6.5949 -27.6154 0.9888 21.0205 -115.1350 -89.1717 -1.3772 -1.4593
0.0022 2.42 5300 0.0152 -7.5122 -28.6578 0.9888 21.1456 -117.2199 -91.0065 -1.3647 -1.4465
0.0022 2.46 5400 0.0149 -7.7072 -29.4467 0.9888 21.7395 -118.7977 -91.3965 -1.3515 -1.4331
0.0001 2.51 5500 0.0137 -7.6730 -29.4473 0.9916 21.7743 -118.7989 -91.3281 -1.3483 -1.4293
0.0022 2.56 5600 0.0133 -7.6989 -29.6686 0.9916 21.9697 -119.2415 -91.3798 -1.3485 -1.4299
0.0011 2.6 5700 0.0095 -6.8592 -28.9672 0.9888 22.1080 -117.8385 -89.7003 -1.3553 -1.4366
0.0054 2.65 5800 0.0077 -6.4136 -28.4244 0.9916 22.0108 -116.7531 -88.8093 -1.3637 -1.4450
0.0033 2.69 5900 0.0115 -7.6490 -30.1521 0.9888 22.5031 -120.2085 -91.2800 -1.3400 -1.4208
0.0011 2.74 6000 0.0086 -6.8537 -29.1407 0.9888 22.2870 -118.1857 -89.6894 -1.3510 -1.4317
0.0011 2.78 6100 0.0095 -7.1201 -29.6324 0.9888 22.5123 -119.1690 -90.2221 -1.3452 -1.4257
0.0022 2.83 6200 0.0086 -6.8942 -29.1673 0.9916 22.2731 -118.2387 -89.7703 -1.3531 -1.4335
0.0013 2.88 6300 0.0086 -6.8366 -29.0334 0.9916 22.1968 -117.9710 -89.6551 -1.3543 -1.4349
0.0033 2.92 6400 0.0096 -7.0073 -29.2913 0.9916 22.2840 -118.4869 -89.9966 -1.3494 -1.4303
0.0011 2.97 6500 0.0092 -6.9778 -29.3366 0.9916 22.3588 -118.5774 -89.9376 -1.3494 -1.4297

Framework versions

  • Transformers 4.35.0
  • Pytorch 2.1.1+cu121
  • Datasets 2.14.6
  • Tokenizers 0.14.1
Downloads last month
13
Safetensors
Model size
6.74B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.