tsavage68's picture
End of training
b961b23 verified
metadata
license: apache-2.0
base_model: mosaicml/mpt-7b-instruct
tags:
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: mpt_1000_STEPS_1e8_rate_03_beta_DPO
    results: []

mpt_1000_STEPS_1e8_rate_03_beta_DPO

This model is a fine-tuned version of mosaicml/mpt-7b-instruct on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6941
  • Rewards/chosen: -1.2875
  • Rewards/rejected: -1.6132
  • Rewards/accuracies: 0.6154
  • Rewards/margins: 0.3257
  • Logps/rejected: -24.7839
  • Logps/chosen: -23.3672
  • Logits/rejected: 14.1648
  • Logits/chosen: 14.1681

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 2
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.7473 0.1 100 0.6927 0.1811 0.1159 0.5582 0.0651 -21.3256 -20.4301 14.3166 14.3195
0.7098 0.2 200 0.7624 0.6571 0.5345 0.5714 0.1226 -20.4884 -19.4780 14.1537 14.1566
0.7516 0.29 300 0.7505 -0.8487 -1.0927 0.5429 0.2440 -23.7428 -22.4895 14.5590 14.5620
0.7762 0.39 400 0.7476 -2.2343 -2.4798 0.5692 0.2455 -26.5171 -25.2608 14.0064 14.0094
0.8328 0.49 500 0.7228 -1.5283 -1.7877 0.5736 0.2594 -25.1329 -23.8488 14.1811 14.1843
0.625 0.59 600 0.7006 -1.3183 -1.6353 0.5978 0.3170 -24.8281 -23.4288 14.3453 14.3486
0.7164 0.68 700 0.7015 -1.2944 -1.6029 0.6022 0.3084 -24.7632 -23.3811 14.2239 14.2271
0.6844 0.78 800 0.6985 -1.2758 -1.5914 0.6198 0.3157 -24.7403 -23.3437 14.1630 14.1663
0.6996 0.88 900 0.6971 -1.2896 -1.6092 0.6110 0.3196 -24.7758 -23.3713 14.1673 14.1706
0.6352 0.98 1000 0.6941 -1.2875 -1.6132 0.6154 0.3257 -24.7839 -23.3672 14.1648 14.1681

Framework versions

  • Transformers 4.39.1
  • Pytorch 2.0.0+cu117
  • Datasets 2.18.0
  • Tokenizers 0.15.2