File size: 5,819 Bytes
40d3264
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
---
library_name: transformers
license: apache-2.0
base_model: tsavage68/Na_M2_1000steps_1e7_SFT
tags:
- trl
- dpo
- generated_from_trainer
model-index:
- name: Na_M2_1000steps_1e7rate_01beta_cSFTDPO
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# Na_M2_1000steps_1e7rate_01beta_cSFTDPO

This model is a fine-tuned version of [tsavage68/Na_M2_1000steps_1e7_SFT](https://huggingface.co/tsavage68/Na_M2_1000steps_1e7_SFT) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 0.0000
- Rewards/chosen: 2.1377
- Rewards/rejected: -11.0023
- Rewards/accuracies: 1.0
- Rewards/margins: 13.1400
- Logps/rejected: -189.9462
- Logps/chosen: -26.7554
- Logits/rejected: -2.3910
- Logits/chosen: -2.4209

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-07
- train_batch_size: 2
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1000

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.0514        | 0.2667 | 50   | 0.0095          | 1.0907         | -3.6125          | 1.0                | 4.7032          | -116.0486      | -37.2253     | -2.5048         | -2.5196       |
| 0.0           | 0.5333 | 100  | 0.0000          | 1.9516         | -8.2814          | 1.0                | 10.2330         | -162.7370      | -28.6162     | -2.4273         | -2.4516       |
| 0.0           | 0.8    | 150  | 0.0000          | 2.0205         | -8.9692          | 1.0                | 10.9897         | -169.6156      | -27.9274     | -2.4141         | -2.4403       |
| 0.0           | 1.0667 | 200  | 0.0000          | 2.0546         | -9.4358          | 1.0                | 11.4904         | -174.2812      | -27.5861     | -2.4057         | -2.4333       |
| 0.0           | 1.3333 | 250  | 0.0000          | 2.0861         | -9.8928          | 1.0                | 11.9789         | -178.8511      | -27.2716     | -2.4011         | -2.4294       |
| 0.0           | 1.6    | 300  | 0.0000          | 2.0968         | -10.1847         | 1.0                | 12.2815         | -181.7704      | -27.1646     | -2.3981         | -2.4268       |
| 0.0           | 1.8667 | 350  | 0.0000          | 2.1068         | -10.4154         | 1.0                | 12.5222         | -184.0774      | -27.0641     | -2.3951         | -2.4241       |
| 0.0           | 2.1333 | 400  | 0.0000          | 2.1173         | -10.5894         | 1.0                | 12.7067         | -185.8174      | -26.9596     | -2.3948         | -2.4241       |
| 0.0           | 2.4    | 450  | 0.0000          | 2.1209         | -10.7301         | 1.0                | 12.8510         | -187.2248      | -26.9235     | -2.3923         | -2.4219       |
| 0.0           | 2.6667 | 500  | 0.0000          | 2.1295         | -10.8281         | 1.0                | 12.9576         | -188.2044      | -26.8375     | -2.3924         | -2.4220       |
| 0.0           | 2.9333 | 550  | 0.0000          | 2.1355         | -10.9054         | 1.0                | 13.0409         | -188.9772      | -26.7771     | -2.3914         | -2.4212       |
| 0.0           | 3.2    | 600  | 0.0000          | 2.1356         | -10.9448         | 1.0                | 13.0805         | -189.3718      | -26.7761     | -2.3903         | -2.4200       |
| 0.0           | 3.4667 | 650  | 0.0000          | 2.1418         | -10.9896         | 1.0                | 13.1314         | -189.8192      | -26.7140     | -2.3895         | -2.4193       |
| 0.0           | 3.7333 | 700  | 0.0000          | 2.1378         | -11.0004         | 1.0                | 13.1382         | -189.9273      | -26.7544     | -2.3901         | -2.4200       |
| 0.0           | 4.0    | 750  | 0.0000          | 2.1390         | -11.0020         | 1.0                | 13.1409         | -189.9431      | -26.7428     | -2.3910         | -2.4208       |
| 0.0           | 4.2667 | 800  | 0.0000          | 2.1358         | -11.0021         | 1.0                | 13.1378         | -189.9439      | -26.7747     | -2.3902         | -2.4201       |
| 0.0           | 4.5333 | 850  | 0.0000          | 2.1380         | -11.0024         | 1.0                | 13.1404         | -189.9469      | -26.7523     | -2.3908         | -2.4207       |
| 0.0           | 4.8    | 900  | 0.0000          | 2.1377         | -11.0023         | 1.0                | 13.1400         | -189.9462      | -26.7554     | -2.3910         | -2.4209       |
| 0.0           | 5.0667 | 950  | 0.0000          | 2.1377         | -11.0023         | 1.0                | 13.1400         | -189.9462      | -26.7554     | -2.3910         | -2.4209       |
| 0.0           | 5.3333 | 1000 | 0.0000          | 2.1377         | -11.0023         | 1.0                | 13.1400         | -189.9462      | -26.7554     | -2.3910         | -2.4209       |


### Framework versions

- Transformers 4.44.2
- Pytorch 2.4.0+cu121
- Datasets 2.21.0
- Tokenizers 0.19.1