Weni
/

WeniGPT-Agents-Mistral-1.0.6-SFT-1.0.3-DPO

+---
+library_name: peft
+tags:
+- trl
+- dpo
+- generated_from_trainer
+base_model: Weni/WeniGPT-Agents-Mistral-1.0.6-SFT-merged
+model-index:
+- name: WeniGPT-Agents-Mistral-1.0.6-SFT-1.0.3-DPO
+  results: []
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# WeniGPT-Agents-Mistral-1.0.6-SFT-1.0.3-DPO
+This model is a fine-tuned version of [Weni/WeniGPT-Agents-Mistral-1.0.6-SFT-merged](https://huggingface.co/Weni/WeniGPT-Agents-Mistral-1.0.6-SFT-merged) on an unknown dataset.
+It achieves the following results on the evaluation set:
+- Loss: 0.3940
+- Rewards/chosen: 2.1209
+- Rewards/rejected: -0.7121
+- Rewards/accuracies: 0.4643
+- Rewards/margins: 2.8330
+- Logps/rejected: -85.8883
+- Logps/chosen: -44.2478
+- Logits/rejected: -1.8122
+- Logits/chosen: -1.7731
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 5e-06
+- train_batch_size: 2
+- eval_batch_size: 2
+- seed: 42
+- gradient_accumulation_steps: 2
+- total_train_batch_size: 4
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: linear
+- lr_scheduler_warmup_ratio: 0.03
+- training_steps: 366
+- mixed_precision_training: Native AMP
+### Training results
+| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
+|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
+| 0.6371        | 0.49  | 30   | 0.5865          | 0.2472         | -0.0086          | 0.4643             | 0.2558          | -83.5434       | -50.4937     | -1.7726         | -1.7380       |
+| 0.5496        | 0.98  | 60   | 0.4964          | 0.5865         | -0.0274          | 0.4643             | 0.6139          | -83.6061       | -49.3627     | -1.7774         | -1.7420       |
+| 0.5185        | 1.46  | 90   | 0.4402          | 1.0091         | -0.0981          | 0.4643             | 1.1072          | -83.8415       | -47.9539     | -1.7827         | -1.7461       |
+| 0.4623        | 1.95  | 120  | 0.4217          | 1.2998         | -0.1810          | 0.4643             | 1.4808          | -84.1178       | -46.9850     | -1.7884         | -1.7512       |
+| 0.4985        | 2.44  | 150  | 0.4069          | 1.5958         | -0.3227          | 0.4643             | 1.9185          | -84.5901       | -45.9983     | -1.7968         | -1.7591       |
+| 0.5276        | 2.93  | 180  | 0.4012          | 1.7623         | -0.4253          | 0.4643             | 2.1876          | -84.9322       | -45.4432     | -1.8018         | -1.7638       |
+| 0.5059        | 3.41  | 210  | 0.3993          | 1.8696         | -0.4661          | 0.4643             | 2.3356          | -85.0681       | -45.0858     | -1.8022         | -1.7637       |
+| 0.4308        | 3.9   | 240  | 0.3972          | 1.9763         | -0.5593          | 0.4643             | 2.5356          | -85.3788       | -44.7300     | -1.8068         | -1.7681       |
+| 0.4277        | 4.39  | 270  | 0.3954          | 2.0294         | -0.6326          | 0.4643             | 2.6620          | -85.6233       | -44.5531     | -1.8100         | -1.7711       |
+| 0.4366        | 4.88  | 300  | 0.3951          | 2.0765         | -0.6602          | 0.4643             | 2.7367          | -85.7153       | -44.3961     | -1.8107         | -1.7718       |
+| 0.4359        | 5.37  | 330  | 0.3941          | 2.1068         | -0.6947          | 0.4643             | 2.8015          | -85.8303       | -44.2949     | -1.8115         | -1.7724       |
+| 0.4413        | 5.85  | 360  | 0.3940          | 2.1209         | -0.7121          | 0.4643             | 2.8330          | -85.8883       | -44.2478     | -1.8122         | -1.7731       |
+### Framework versions
+- PEFT 0.10.0
+- Transformers 4.38.2
+- Pytorch 2.1.0+cu118
+- Datasets 2.18.0
+- Tokenizers 0.15.2