---
library_name: transformers
license: apache-2.0
base_model: tsavage68/Na_M2_1000steps_1e7_SFT
tags:
- trl
- dpo
- generated_from_trainer
model-index:
- name: Na_M2_1000steps_1e7rate_05beta_cSFTDPO
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# Na_M2_1000steps_1e7rate_05beta_cSFTDPO

This model is a fine-tuned version of [tsavage68/Na_M2_1000steps_1e7_SFT](https://huggingface.co/tsavage68/Na_M2_1000steps_1e7_SFT) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 0.0000
- Rewards/chosen: 3.4353
- Rewards/rejected: -12.0460
- Rewards/accuracies: 1.0
- Rewards/margins: 15.4813
- Logps/rejected: -104.0153
- Logps/chosen: -41.2618
- Logits/rejected: -2.5171
- Logits/chosen: -2.5312

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-07
- train_batch_size: 2
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1000

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.0           | 0.2667 | 50   | 0.0000          | 2.4333         | -8.7946          | 1.0                | 11.2279         | -97.5125       | -43.2658     | -2.5259         | -2.5391       |
| 0.0           | 0.5333 | 100  | 0.0000          | 2.7977         | -9.9936          | 1.0                | 12.7913         | -99.9105       | -42.5369     | -2.5223         | -2.5359       |
| 0.0           | 0.8    | 150  | 0.0000          | 2.9419         | -10.6551         | 1.0                | 13.5970         | -101.2335      | -42.2486     | -2.5210         | -2.5347       |
| 0.0           | 1.0667 | 200  | 0.0000          | 3.0397         | -10.9989         | 1.0                | 14.0386         | -101.9212      | -42.0530     | -2.5209         | -2.5347       |
| 0.0           | 1.3333 | 250  | 0.0000          | 3.1479         | -11.2365         | 1.0                | 14.3844         | -102.3963      | -41.8365     | -2.5209         | -2.5348       |
| 0.0           | 1.6    | 300  | 0.0000          | 3.1788         | -11.4604         | 1.0                | 14.6393         | -102.8442      | -41.7747     | -2.5197         | -2.5337       |
| 0.0           | 1.8667 | 350  | 0.0000          | 3.2803         | -11.6306         | 1.0                | 14.9109         | -103.1846      | -41.5718     | -2.5199         | -2.5339       |
| 0.0           | 2.1333 | 400  | 0.0000          | 3.3009         | -11.7868         | 1.0                | 15.0878         | -103.4970      | -41.5305     | -2.5189         | -2.5328       |
| 0.0           | 2.4    | 450  | 0.0000          | 3.3596         | -11.8664         | 1.0                | 15.2260         | -103.6562      | -41.4132     | -2.5179         | -2.5319       |
| 0.0           | 2.6667 | 500  | 0.0000          | 3.3481         | -11.9338         | 1.0                | 15.2818         | -103.7909      | -41.4363     | -2.5176         | -2.5316       |
| 0.0           | 2.9333 | 550  | 0.0000          | 3.3954         | -11.9591         | 1.0                | 15.3545         | -103.8415      | -41.3415     | -2.5186         | -2.5326       |
| 0.0           | 3.2    | 600  | 0.0000          | 3.4233         | -12.0436         | 1.0                | 15.4669         | -104.0106      | -41.2858     | -2.5181         | -2.5321       |
| 0.0           | 3.4667 | 650  | 0.0000          | 3.4170         | -12.0535         | 1.0                | 15.4704         | -104.0303      | -41.2985     | -2.5183         | -2.5323       |
| 0.0           | 3.7333 | 700  | 0.0000          | 3.3924         | -12.0736         | 1.0                | 15.4660         | -104.0705      | -41.3476     | -2.5178         | -2.5318       |
| 0.0           | 4.0    | 750  | 0.0000          | 3.4428         | -12.0566         | 1.0                | 15.4994         | -104.0365      | -41.2468     | -2.5180         | -2.5321       |
| 0.0           | 4.2667 | 800  | 0.0000          | 3.4331         | -12.0469         | 1.0                | 15.4800         | -104.0172      | -41.2661     | -2.5173         | -2.5314       |
| 0.0           | 4.5333 | 850  | 0.0000          | 3.4177         | -12.0794         | 1.0                | 15.4970         | -104.0821      | -41.2971     | -2.5172         | -2.5312       |
| 0.0           | 4.8    | 900  | 0.0000          | 3.4353         | -12.0460         | 1.0                | 15.4813         | -104.0153      | -41.2618     | -2.5171         | -2.5312       |
| 0.0           | 5.0667 | 950  | 0.0000          | 3.4353         | -12.0460         | 1.0                | 15.4813         | -104.0153      | -41.2618     | -2.5171         | -2.5312       |
| 0.0           | 5.3333 | 1000 | 0.0000          | 3.4353         | -12.0460         | 1.0                | 15.4813         | -104.0153      | -41.2618     | -2.5171         | -2.5312       |


### Framework versions

- Transformers 4.44.2
- Pytorch 2.4.0+cu121
- Datasets 2.21.0
- Tokenizers 0.19.1