Edit model card

doplhin-dpo

This model is a fine-tuned version of cognitivecomputations/dolphin-2.1-mistral-7b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.0809
  • Rewards/chosen: -22.5048
  • Rewards/rejected: -33.0285
  • Rewards/accuracies: 0.8076
  • Rewards/margins: 10.5237
  • Logps/rejected: -629.7220
  • Logps/chosen: -567.8747
  • Logits/rejected: -2.5481
  • Logits/chosen: -2.5972

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
3.7986 0.13 700 2.1962 -19.2854 -23.2541 0.6680 3.9687 -531.9778 -535.6807 -2.6393 -2.7332
1.3794 0.25 1400 1.5931 -24.2833 -32.1549 0.7393 7.8716 -620.9865 -585.6600 -2.4941 -2.6078
1.7768 0.38 2100 1.2640 -24.9513 -33.2837 0.7618 8.3324 -632.2739 -592.3398 -1.5676 -1.9552
1.0764 0.51 2800 1.1802 -24.8340 -32.7263 0.7807 7.8923 -626.7006 -591.1669 -2.2188 -2.3807
1.1698 0.64 3500 1.1290 -17.1234 -26.7346 0.7982 9.6112 -566.7830 -514.0612 -2.6586 -2.7169
1.1884 0.76 4200 1.0909 -23.1635 -33.5559 0.8044 10.3924 -634.9959 -574.4622 -2.5670 -2.6170
0.6424 0.89 4900 1.0809 -22.5048 -33.0285 0.8076 10.5237 -629.7220 -567.8747 -2.5481 -2.5972

Framework versions

  • PEFT 0.8.2
  • Transformers 4.37.2
  • Pytorch 2.2.2+cu121
  • Datasets 2.16.1
  • Tokenizers 0.15.2
Downloads last month
10
Safetensors
Model size
7.24B params
Tensor type
FP16
·
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for Liu-Xiang/doplhin-dpo

Adapter
(11)
this model