IE_M2_1000steps_1e6rate_01beta_cSFTDPO
This model is a fine-tuned version of tsavage68/IE_M2_1000steps_1e7rate_SFT on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.3743
- Rewards/chosen: -0.0096
- Rewards/rejected: -8.8855
- Rewards/accuracies: 0.4600
- Rewards/margins: 8.8759
- Logps/rejected: -129.8764
- Logps/chosen: -42.3012
- Logits/rejected: -2.8667
- Logits/chosen: -2.7910
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-06
- train_batch_size: 2
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1000
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.4507 | 0.4 | 50 | 0.3744 | -0.2335 | -4.5985 | 0.4600 | 4.3650 | -87.0066 | -44.5404 | -2.8777 | -2.8157 |
0.3812 | 0.8 | 100 | 0.3743 | -0.5108 | -6.6921 | 0.4600 | 6.1813 | -107.9430 | -47.3138 | -2.8657 | -2.7955 |
0.3119 | 1.2 | 150 | 0.3743 | -0.1626 | -7.3145 | 0.4600 | 7.1519 | -114.1667 | -43.8313 | -2.8595 | -2.7859 |
0.3639 | 1.6 | 200 | 0.3743 | -0.0733 | -7.7721 | 0.4600 | 7.6988 | -118.7424 | -42.9385 | -2.8656 | -2.7905 |
0.4332 | 2.0 | 250 | 0.3743 | -0.0463 | -8.0479 | 0.4600 | 8.0016 | -121.5008 | -42.6684 | -2.8656 | -2.7903 |
0.3986 | 2.4 | 300 | 0.3743 | -0.0312 | -8.2241 | 0.4600 | 8.1929 | -123.2630 | -42.5179 | -2.8658 | -2.7905 |
0.3986 | 2.8 | 350 | 0.3743 | -0.0173 | -8.3343 | 0.4600 | 8.3171 | -124.3653 | -42.3781 | -2.8664 | -2.7908 |
0.4505 | 3.2 | 400 | 0.3743 | -0.0158 | -8.5177 | 0.4600 | 8.5019 | -126.1987 | -42.3632 | -2.8666 | -2.7910 |
0.4505 | 3.6 | 450 | 0.3743 | -0.0135 | -8.5518 | 0.4600 | 8.5383 | -126.5393 | -42.3402 | -2.8666 | -2.7910 |
0.4332 | 4.0 | 500 | 0.3743 | -0.0117 | -8.6642 | 0.4600 | 8.6525 | -127.6642 | -42.3228 | -2.8665 | -2.7909 |
0.3292 | 4.4 | 550 | 0.3743 | -0.0128 | -8.6957 | 0.4600 | 8.6829 | -127.9786 | -42.3337 | -2.8666 | -2.7910 |
0.3639 | 4.8 | 600 | 0.3743 | -0.0122 | -8.7991 | 0.4600 | 8.7869 | -129.0126 | -42.3276 | -2.8671 | -2.7915 |
0.4505 | 5.2 | 650 | 0.3743 | -0.0110 | -8.8312 | 0.4600 | 8.8202 | -129.3338 | -42.3151 | -2.8667 | -2.7910 |
0.4505 | 5.6 | 700 | 0.3743 | -0.0140 | -8.8523 | 0.4600 | 8.8383 | -129.5449 | -42.3457 | -2.8668 | -2.7911 |
0.3639 | 6.0 | 750 | 0.3743 | -0.0142 | -8.8760 | 0.4600 | 8.8618 | -129.7817 | -42.3476 | -2.8666 | -2.7909 |
0.2426 | 6.4 | 800 | 0.3743 | -0.0114 | -8.8848 | 0.4600 | 8.8734 | -129.8699 | -42.3197 | -2.8667 | -2.7910 |
0.5025 | 6.8 | 850 | 0.3743 | -0.0110 | -8.8824 | 0.4600 | 8.8714 | -129.8454 | -42.3153 | -2.8666 | -2.7910 |
0.3119 | 7.2 | 900 | 0.3743 | -0.0122 | -8.8932 | 0.4600 | 8.8810 | -129.9536 | -42.3276 | -2.8668 | -2.7911 |
0.3466 | 7.6 | 950 | 0.3743 | -0.0106 | -8.8884 | 0.4600 | 8.8778 | -129.9054 | -42.3112 | -2.8667 | -2.7910 |
0.3812 | 8.0 | 1000 | 0.3743 | -0.0096 | -8.8855 | 0.4600 | 8.8759 | -129.8764 | -42.3012 | -2.8667 | -2.7910 |
Framework versions
- Transformers 4.44.2
- Pytorch 2.0.0+cu117
- Datasets 3.0.0
- Tokenizers 0.19.1
- Downloads last month
- 4
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
Model tree for tsavage68/IE_M2_1000steps_1e6rate_01beta_cSFTDPO
Base model
mistralai/Mistral-7B-Instruct-v0.2
Finetuned
tsavage68/IE_M2_1000steps_1e7rate_SFT