Edit model card

llama-7b-SFT-qlora-wiki_DPO_ds_RM_random_1024_r_64_alpha_16

This model is a fine-tuned version of dhmeltzer/llama-7b-SFT_ds_wiki65k_1024_r_64_alpha_16_merged on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6801
  • Rewards/chosen: -0.1790
  • Rewards/rejected: -0.2369
  • Rewards/accuracies: 0.5469
  • Rewards/margins: 0.0578
  • Logps/rejected: -206.1121
  • Logps/chosen: -202.9860
  • Logits/rejected: 1.1465
  • Logits/chosen: 1.1674

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.03
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6904 0.1 19 0.6904 -0.3143 -0.3636 0.5458 0.0493 -207.3793 -204.3384 1.1224 1.1416
0.6725 0.21 38 0.6850 -0.3901 -0.4540 0.5547 0.0640 -208.2836 -205.0964 1.1270 1.1469
0.6818 0.31 57 0.6801 -0.1790 -0.2369 0.5469 0.0578 -206.1121 -202.9860 1.1465 1.1674
0.6671 0.41 76 0.6863 -0.2598 -0.3469 0.5580 0.0871 -207.2126 -203.7936 1.1468 1.1665
0.6683 0.52 95 0.6841 -0.1475 -0.2325 0.5502 0.0851 -206.0687 -202.6704 1.1388 1.1590
0.6626 0.62 114 0.6846 -0.0836 -0.1600 0.5480 0.0764 -205.3429 -202.0314 1.1263 1.1474
0.6593 0.72 133 0.6864 -0.1272 -0.2184 0.5625 0.0912 -205.9276 -202.4675 1.1106 1.1306
0.672 0.83 152 0.6857 -0.1452 -0.2334 0.5592 0.0882 -206.0777 -202.6477 1.1086 1.1293
0.6671 0.93 171 0.6855 -0.1472 -0.2350 0.5547 0.0878 -206.0934 -202.6673 1.1071 1.1270

Framework versions

  • Transformers 4.32.1
  • Pytorch 2.0.1+cu118
  • Datasets 2.14.4
  • Tokenizers 0.13.3
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference API
Unable to determine this model's library. Check the docs .

Model tree for dhmeltzer/llama-7b-SFT-qlora-wiki_DPO_ds_RM_random_1024_r_64_alpha_16