File size: 5,816 Bytes

7a24470

---
library_name: transformers
license: apache-2.0
base_model: tsavage68/Na_M2_1000steps_1e7_SFT
tags:
- trl
- dpo
- generated_from_trainer
model-index:
- name: Na_M2_1000steps_1e8rate_03beta_cSFTDPO
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# Na_M2_1000steps_1e8rate_03beta_cSFTDPO

This model is a fine-tuned version of [tsavage68/Na_M2_1000steps_1e7_SFT](https://huggingface.co/tsavage68/Na_M2_1000steps_1e7_SFT) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 0.4450
- Rewards/chosen: 0.1680
- Rewards/rejected: -0.4255
- Rewards/accuracies: 1.0
- Rewards/margins: 0.5934
- Logps/rejected: -81.3416
- Logps/chosen: -47.5724
- Logits/rejected: -2.5355
- Logits/chosen: -2.5481

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-08
- train_batch_size: 2
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1000

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.6955        | 0.2667 | 50   | 0.6882          | 0.0099         | -0.0031          | 0.5600             | 0.0130          | -79.9338       | -48.0995     | -2.5354         | -2.5481       |
| 0.6761        | 0.5333 | 100  | 0.6730          | 0.0130         | -0.0315          | 0.6600             | 0.0445          | -80.0283       | -48.0889     | -2.5363         | -2.5489       |
| 0.6154        | 0.8    | 150  | 0.5971          | 0.0672         | -0.1393          | 0.9800             | 0.2065          | -80.3878       | -47.9083     | -2.5367         | -2.5493       |
| 0.5735        | 1.0667 | 200  | 0.5430          | 0.1029         | -0.2302          | 1.0                | 0.3331          | -80.6906       | -47.7893     | -2.5352         | -2.5478       |
| 0.5047        | 1.3333 | 250  | 0.5020          | 0.1363         | -0.3030          | 1.0                | 0.4393          | -80.9334       | -47.6779     | -2.5353         | -2.5478       |
| 0.4525        | 1.6    | 300  | 0.4751          | 0.1411         | -0.3685          | 1.0                | 0.5096          | -81.1517       | -47.6622     | -2.5350         | -2.5476       |
| 0.451         | 1.8667 | 350  | 0.4572          | 0.1576         | -0.3988          | 1.0                | 0.5564          | -81.2528       | -47.6072     | -2.5350         | -2.5475       |
| 0.4434        | 2.1333 | 400  | 0.4501          | 0.1391         | -0.4387          | 1.0                | 0.5778          | -81.3857       | -47.6686     | -2.5351         | -2.5477       |
| 0.4313        | 2.4    | 450  | 0.4454          | 0.1528         | -0.4370          | 1.0                | 0.5899          | -81.3802       | -47.6230     | -2.5343         | -2.5469       |
| 0.4546        | 2.6667 | 500  | 0.4513          | 0.1462         | -0.4293          | 1.0                | 0.5755          | -81.3544       | -47.6450     | -2.5345         | -2.5471       |
| 0.4526        | 2.9333 | 550  | 0.4424          | 0.1917         | -0.4110          | 1.0                | 0.6027          | -81.2934       | -47.4934     | -2.5352         | -2.5476       |
| 0.4426        | 3.2    | 600  | 0.4437          | 0.1805         | -0.4175          | 1.0                | 0.5980          | -81.3150       | -47.5307     | -2.5361         | -2.5486       |
| 0.4452        | 3.4667 | 650  | 0.4403          | 0.1651         | -0.4392          | 1.0                | 0.6043          | -81.3875       | -47.5821     | -2.5347         | -2.5473       |
| 0.418         | 3.7333 | 700  | 0.4450          | 0.1668         | -0.4237          | 1.0                | 0.5905          | -81.3358       | -47.5764     | -2.5348         | -2.5474       |
| 0.4281        | 4.0    | 750  | 0.4450          | 0.1680         | -0.4255          | 1.0                | 0.5934          | -81.3416       | -47.5724     | -2.5355         | -2.5481       |
| 0.4503        | 4.2667 | 800  | 0.4450          | 0.1680         | -0.4255          | 1.0                | 0.5934          | -81.3416       | -47.5724     | -2.5355         | -2.5481       |
| 0.4372        | 4.5333 | 850  | 0.4450          | 0.1680         | -0.4255          | 1.0                | 0.5934          | -81.3416       | -47.5724     | -2.5355         | -2.5481       |
| 0.4135        | 4.8    | 900  | 0.4450          | 0.1680         | -0.4255          | 1.0                | 0.5934          | -81.3416       | -47.5724     | -2.5355         | -2.5481       |
| 0.4316        | 5.0667 | 950  | 0.4450          | 0.1680         | -0.4255          | 1.0                | 0.5934          | -81.3416       | -47.5724     | -2.5355         | -2.5481       |
| 0.4438        | 5.3333 | 1000 | 0.4450          | 0.1680         | -0.4255          | 1.0                | 0.5934          | -81.3416       | -47.5724     | -2.5355         | -2.5481       |


### Framework versions

- Transformers 4.44.2
- Pytorch 2.4.0+cu121
- Datasets 2.21.0
- Tokenizers 0.19.1