File size: 5,816 Bytes
98cd596
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
---
library_name: transformers
license: apache-2.0
base_model: tsavage68/Na_M2_1000steps_1e7_SFT
tags:
- trl
- dpo
- generated_from_trainer
model-index:
- name: Na_M2_1000steps_1e8rate_01beta_cSFTDPO
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# Na_M2_1000steps_1e8rate_01beta_cSFTDPO

This model is a fine-tuned version of [tsavage68/Na_M2_1000steps_1e7_SFT](https://huggingface.co/tsavage68/Na_M2_1000steps_1e7_SFT) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 0.6023
- Rewards/chosen: 0.0529
- Rewards/rejected: -0.1392
- Rewards/accuracies: 1.0
- Rewards/margins: 0.1921
- Logps/rejected: -81.3154
- Logps/chosen: -47.6033
- Logits/rejected: -2.5345
- Logits/chosen: -2.5471

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-08
- train_batch_size: 2
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1000

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.6929        | 0.2667 | 50   | 0.6931          | -0.0010        | -0.0014          | 0.5700             | 0.0003          | -79.9371       | -48.1427     | -2.5355         | -2.5481       |
| 0.6881        | 0.5333 | 100  | 0.6832          | 0.0062         | -0.0142          | 0.6900             | 0.0204          | -80.0656       | -48.0704     | -2.5357         | -2.5482       |
| 0.6652        | 0.8    | 150  | 0.6568          | 0.0223         | -0.0526          | 0.9500             | 0.0748          | -80.4490       | -47.9098     | -2.5356         | -2.5482       |
| 0.6475        | 1.0667 | 200  | 0.6389          | 0.0327         | -0.0794          | 1.0                | 0.1121          | -80.7177       | -47.8054     | -2.5355         | -2.5481       |
| 0.6224        | 1.3333 | 250  | 0.6217          | 0.0389         | -0.1104          | 1.0                | 0.1492          | -81.0270       | -47.7436     | -2.5352         | -2.5477       |
| 0.6068        | 1.6    | 300  | 0.6115          | 0.0553         | -0.1167          | 1.0                | 0.1720          | -81.0905       | -47.5798     | -2.5353         | -2.5478       |
| 0.6018        | 1.8667 | 350  | 0.6041          | 0.0523         | -0.1359          | 1.0                | 0.1882          | -81.2823       | -47.6092     | -2.5345         | -2.5471       |
| 0.5976        | 2.1333 | 400  | 0.6021          | 0.0543         | -0.1384          | 1.0                | 0.1927          | -81.3072       | -47.5892     | -2.5349         | -2.5474       |
| 0.5952        | 2.4    | 450  | 0.5993          | 0.0581         | -0.1408          | 1.0                | 0.1990          | -81.3318       | -47.5512     | -2.5343         | -2.5468       |
| 0.6013        | 2.6667 | 500  | 0.6022          | 0.0541         | -0.1384          | 1.0                | 0.1925          | -81.3071       | -47.5913     | -2.5347         | -2.5472       |
| 0.5981        | 2.9333 | 550  | 0.6027          | 0.0571         | -0.1340          | 1.0                | 0.1911          | -81.2633       | -47.5610     | -2.5348         | -2.5473       |
| 0.6006        | 3.2    | 600  | 0.6009          | 0.0589         | -0.1365          | 1.0                | 0.1954          | -81.2883       | -47.5433     | -2.5347         | -2.5473       |
| 0.5961        | 3.4667 | 650  | 0.6036          | 0.0539         | -0.1354          | 1.0                | 0.1893          | -81.2771       | -47.5931     | -2.5350         | -2.5476       |
| 0.5896        | 3.7333 | 700  | 0.6024          | 0.0550         | -0.1368          | 1.0                | 0.1918          | -81.2913       | -47.5819     | -2.5345         | -2.5471       |
| 0.593         | 4.0    | 750  | 0.6023          | 0.0529         | -0.1392          | 1.0                | 0.1921          | -81.3154       | -47.6033     | -2.5345         | -2.5471       |
| 0.603         | 4.2667 | 800  | 0.6023          | 0.0529         | -0.1392          | 1.0                | 0.1921          | -81.3154       | -47.6033     | -2.5345         | -2.5471       |
| 0.5989        | 4.5333 | 850  | 0.6023          | 0.0529         | -0.1392          | 1.0                | 0.1921          | -81.3154       | -47.6033     | -2.5345         | -2.5471       |
| 0.5879        | 4.8    | 900  | 0.6023          | 0.0529         | -0.1392          | 1.0                | 0.1921          | -81.3154       | -47.6033     | -2.5345         | -2.5471       |
| 0.5949        | 5.0667 | 950  | 0.6023          | 0.0529         | -0.1392          | 1.0                | 0.1921          | -81.3154       | -47.6033     | -2.5345         | -2.5471       |
| 0.5974        | 5.3333 | 1000 | 0.6023          | 0.0529         | -0.1392          | 1.0                | 0.1921          | -81.3154       | -47.6033     | -2.5345         | -2.5471       |


### Framework versions

- Transformers 4.44.2
- Pytorch 2.4.0+cu121
- Datasets 2.21.0
- Tokenizers 0.19.1