IE_M2_1000steps_1e6rate_01beta_cSFTDPO

This model is a fine-tuned version of tsavage68/IE_M2_1000steps_1e7rate_SFT on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3743
  • Rewards/chosen: -0.0096
  • Rewards/rejected: -8.8855
  • Rewards/accuracies: 0.4600
  • Rewards/margins: 8.8759
  • Logps/rejected: -129.8764
  • Logps/chosen: -42.3012
  • Logits/rejected: -2.8667
  • Logits/chosen: -2.7910

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 2
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.4507 0.4 50 0.3744 -0.2335 -4.5985 0.4600 4.3650 -87.0066 -44.5404 -2.8777 -2.8157
0.3812 0.8 100 0.3743 -0.5108 -6.6921 0.4600 6.1813 -107.9430 -47.3138 -2.8657 -2.7955
0.3119 1.2 150 0.3743 -0.1626 -7.3145 0.4600 7.1519 -114.1667 -43.8313 -2.8595 -2.7859
0.3639 1.6 200 0.3743 -0.0733 -7.7721 0.4600 7.6988 -118.7424 -42.9385 -2.8656 -2.7905
0.4332 2.0 250 0.3743 -0.0463 -8.0479 0.4600 8.0016 -121.5008 -42.6684 -2.8656 -2.7903
0.3986 2.4 300 0.3743 -0.0312 -8.2241 0.4600 8.1929 -123.2630 -42.5179 -2.8658 -2.7905
0.3986 2.8 350 0.3743 -0.0173 -8.3343 0.4600 8.3171 -124.3653 -42.3781 -2.8664 -2.7908
0.4505 3.2 400 0.3743 -0.0158 -8.5177 0.4600 8.5019 -126.1987 -42.3632 -2.8666 -2.7910
0.4505 3.6 450 0.3743 -0.0135 -8.5518 0.4600 8.5383 -126.5393 -42.3402 -2.8666 -2.7910
0.4332 4.0 500 0.3743 -0.0117 -8.6642 0.4600 8.6525 -127.6642 -42.3228 -2.8665 -2.7909
0.3292 4.4 550 0.3743 -0.0128 -8.6957 0.4600 8.6829 -127.9786 -42.3337 -2.8666 -2.7910
0.3639 4.8 600 0.3743 -0.0122 -8.7991 0.4600 8.7869 -129.0126 -42.3276 -2.8671 -2.7915
0.4505 5.2 650 0.3743 -0.0110 -8.8312 0.4600 8.8202 -129.3338 -42.3151 -2.8667 -2.7910
0.4505 5.6 700 0.3743 -0.0140 -8.8523 0.4600 8.8383 -129.5449 -42.3457 -2.8668 -2.7911
0.3639 6.0 750 0.3743 -0.0142 -8.8760 0.4600 8.8618 -129.7817 -42.3476 -2.8666 -2.7909
0.2426 6.4 800 0.3743 -0.0114 -8.8848 0.4600 8.8734 -129.8699 -42.3197 -2.8667 -2.7910
0.5025 6.8 850 0.3743 -0.0110 -8.8824 0.4600 8.8714 -129.8454 -42.3153 -2.8666 -2.7910
0.3119 7.2 900 0.3743 -0.0122 -8.8932 0.4600 8.8810 -129.9536 -42.3276 -2.8668 -2.7911
0.3466 7.6 950 0.3743 -0.0106 -8.8884 0.4600 8.8778 -129.9054 -42.3112 -2.8667 -2.7910
0.3812 8.0 1000 0.3743 -0.0096 -8.8855 0.4600 8.8759 -129.8764 -42.3012 -2.8667 -2.7910

Framework versions

  • Transformers 4.44.2
  • Pytorch 2.0.0+cu117
  • Datasets 3.0.0
  • Tokenizers 0.19.1
Downloads last month
4
Safetensors
Model size
7.24B params
Tensor type
FP16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for tsavage68/IE_M2_1000steps_1e6rate_01beta_cSFTDPO