Safetensors
llama
alignment-handbook
trl
dpo
Generated from Trainer
File size: 14,546 Bytes
a9cd984
 
 
 
f260624
a9cd984
 
 
f260624
 
 
 
 
 
 
a9cd984
 
 
 
 
 
 
 
 
 
 
f260624
a9cd984
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
---
license: other
base_model: deepseek-ai/deepseek-llm-7b-chat
tags:
- alignment-handbook
- trl
- dpo
- generated_from_trainer
- trl
- dpo
- generated_from_trainer
datasets:
- self-generate/ds_chat_original_cn_mining_oj_iter0-binarized
- self-generate/ds_chat_original_cn_mining_sandbox_iter0-binarized
- self-generate/ds_chat_original_cn_rl_oj_iter0-binarized
model-index:
- name: ds_chat_sppo_hard_cosine_iter0_2024-09-16-16.38
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://ml.byteintl.net/experiment/tracking/detail?Id=project_20240915_20321b8f&selectedTrial=run_20240917_b8ad0ab9)
# ds_chat_sppo_hard_cosine_iter0_2024-09-16-16.38

This model is a fine-tuned version of [deepseek-ai/deepseek-llm-7b-chat](https://huggingface.co/deepseek-ai/deepseek-llm-7b-chat) on the self-generate/ds_chat_original_cn_mining_oj_iter0-binarized, the self-generate/ds_chat_original_cn_mining_sandbox_iter0-binarized and the self-generate/ds_chat_original_cn_rl_oj_iter0-binarized datasets.
It achieves the following results on the evaluation set:
- Loss: 4957.3081
- Rewards/chosen: 0.0206
- Rewards/rejected: -0.0002
- Rewards/accuracies: 0.3026
- Rewards/margins: 0.0208
- Logps/rejected: -63.9058
- Logps/chosen: -121.0837
- Logits/rejected: 1.7198
- Logits/chosen: 1.6603
- Debug/policy Chosen Logits: 1.6603
- Debug/policy Rejected Logits: 1.7198
- Debug/policy Chosen Logps: -121.0837
- Debug/policy Rejected Logps: -63.9058
- Debug/reference Chosen Logps: -123.1481
- Debug/reference Rejected Logps: -63.8871
- Debug/sppo Chosen Reward In Loss: 2.0643
- Debug/sppo Rej Reward In Loss: -0.0187
- Debug/sppo Chosen Loss: 2387.4246
- Debug/sppo Reject Loss: 2498.1609

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-07
- train_batch_size: 8
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- total_train_batch_size: 64
- total_eval_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- lr_scheduler_warmup_steps: 100
- num_epochs: 8.0

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | Debug/policy Chosen Logits | Debug/policy Rejected Logits | Debug/policy Chosen Logps | Debug/policy Rejected Logps | Debug/reference Chosen Logps | Debug/reference Rejected Logps | Debug/sppo Chosen Reward In Loss | Debug/sppo Rej Reward In Loss | Debug/sppo Chosen Loss | Debug/sppo Reject Loss |
|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|:--------------------------:|:----------------------------:|:-------------------------:|:---------------------------:|:----------------------------:|:------------------------------:|:--------------------------------:|:-----------------------------:|:----------------------:|:----------------------:|
| 4999.5461     | 0.3623 | 100  | 4988.0952       | 0.0050         | 0.0020           | 0.2763             | 0.0031          | -63.6883       | -122.6432    | 1.7269          | 1.6642        | 1.6642                     | 1.7269                       | -122.6432                 | -63.6883                    | -123.1481                    | -63.8871                       | 0.5049                           | 0.1988                        | 2453.1523              | 2523.2144              |
| 5011.4531     | 0.7246 | 200  | 4990.5610       | 0.0177         | 0.0058           | 0.3158             | 0.0119          | -63.3097       | -121.3786    | 1.7330          | 1.6732        | 1.6732                     | 1.7330                       | -121.3786                 | -63.3097                    | -123.1481                    | -63.8871                       | 1.7695                           | 0.5774                        | 2386.0396              | 2582.6948              |
| 4987.3762     | 1.0870 | 300  | 4987.7910       | 0.0199         | 0.0061           | 0.2632             | 0.0137          | -63.2725       | -121.1585    | 1.7421          | 1.6830        | 1.6830                     | 1.7421                       | -121.1585                 | -63.2725                    | -123.1481                    | -63.8871                       | 1.9895                           | 0.6145                        | 2385.2695              | 2590.7976              |
| 5014.9531     | 1.4493 | 400  | 4983.8423       | 0.0200         | 0.0047           | 0.2632             | 0.0152          | -63.4148       | -121.1519    | 1.7308          | 1.6711        | 1.6711                     | 1.7308                       | -121.1519                 | -63.4148                    | -123.1481                    | -63.8871                       | 1.9962                           | 0.4722                        | 2383.6707              | 2565.9753              |
| 5006.941      | 1.8116 | 500  | 4965.4326       | 0.0117         | -0.0005          | 0.3158             | 0.0122          | -63.9328       | -121.9733    | 1.7113          | 1.6503        | 1.6503                     | 1.7113                       | -121.9733                 | -63.9328                    | -123.1481                    | -63.8871                       | 1.1748                           | -0.0457                       | 2416.3770              | 2495.6252              |
| 4945.2656     | 2.1739 | 600  | 4971.4199       | 0.0165         | 0.0030           | 0.2632             | 0.0134          | -63.5826       | -121.4996    | 1.7310          | 1.6724        | 1.6724                     | 1.7310                       | -121.4996                 | -63.5826                    | -123.1481                    | -63.8871                       | 1.6485                           | 0.3045                        | 2391.6709              | 2537.9797              |
| 5016.1723     | 2.5362 | 700  | 4956.6055       | 0.0193         | 0.0038           | 0.3684             | 0.0155          | -63.5097       | -121.2218    | 1.7528          | 1.6919        | 1.6919                     | 1.7528                       | -121.2218                 | -63.5097                    | -123.1481                    | -63.8871                       | 1.9263                           | 0.3774                        | 2372.3936              | 2549.7046              |
| 4980.475      | 2.8986 | 800  | 4967.6992       | 0.0217         | 0.0048           | 0.3421             | 0.0169          | -63.4108       | -120.9796    | 1.7533          | 1.6937        | 1.6937                     | 1.7533                       | -120.9796                 | -63.4108                    | -123.1481                    | -63.8871                       | 2.1685                           | 0.4763                        | 2370.3362              | 2566.8535              |
| 4962.825      | 3.2609 | 900  | 4973.9316       | 0.0239         | 0.0047           | 0.3026             | 0.0192          | -63.4168       | -120.7541    | 1.7347          | 1.6754        | 1.6754                     | 1.7347                       | -120.7541                 | -63.4168                    | -123.1481                    | -63.8871                       | 2.3940                           | 0.4702                        | 2374.9814              | 2564.9277              |
| 4960.6797     | 3.6232 | 1000 | 4954.9062       | 0.0185         | 0.0027           | 0.3553             | 0.0158          | -63.6219       | -121.2982    | 1.7363          | 1.6773        | 1.6773                     | 1.7363                       | -121.2982                 | -63.6219                    | -123.1481                    | -63.8871                       | 1.8498                           | 0.2651                        | 2376.7742              | 2531.5662              |
| 4996.0746     | 3.9855 | 1100 | 4978.2021       | 0.0089         | -0.0022          | 0.3684             | 0.0112          | -64.1119       | -122.2532    | 1.6884          | 1.6291        | 1.6291                     | 1.6884                       | -122.2532                 | -64.1119                    | -123.1481                    | -63.8871                       | 0.8949                           | -0.2249                       | 2438.2773              | 2479.8074              |
| 4988.032      | 4.3478 | 1200 | 4952.4019       | 0.0171         | -0.0003          | 0.3816             | 0.0174          | -63.9132       | -121.4333    | 1.7223          | 1.6634        | 1.6634                     | 1.7223                       | -121.4333                 | -63.9132                    | -123.1481                    | -63.8871                       | 1.7148                           | -0.0261                       | 2381.5840              | 2497.4338              |
| 4982.1008     | 4.7101 | 1300 | 4951.4316       | 0.0171         | -0.0003          | 0.3553             | 0.0174          | -63.9127       | -121.4370    | 1.7192          | 1.6602        | 1.6602                     | 1.7192                       | -121.4370                 | -63.9127                    | -123.1481                    | -63.8871                       | 1.7111                           | -0.0257                       | 2388.1934              | 2497.4824              |
| 4966.7375     | 5.0725 | 1400 | 4954.5615       | 0.0185         | 0.0008           | 0.3289             | 0.0177          | -63.8112       | -121.3000    | 1.7216          | 1.6631        | 1.6631                     | 1.7216                       | -121.3000                 | -63.8112                    | -123.1481                    | -63.8871                       | 1.8480                           | 0.0759                        | 2383.4727              | 2508.1672              |
| 4937.6176     | 5.4348 | 1500 | 4952.7949       | 0.0157         | -0.0019          | 0.3289             | 0.0176          | -64.0738       | -121.5761    | 1.7099          | 1.6508        | 1.6508                     | 1.7099                       | -121.5761                 | -64.0738                    | -123.1481                    | -63.8871                       | 1.5720                           | -0.1868                       | 2396.6667              | 2483.3738              |
| 4969.5398     | 5.7971 | 1600 | 4948.7925       | 0.0184         | -0.0001          | 0.3289             | 0.0186          | -63.8999       | -121.3049    | 1.7190          | 1.6601        | 1.6601                     | 1.7190                       | -121.3049                 | -63.8999                    | -123.1481                    | -63.8871                       | 1.8432                           | -0.0128                       | 2383.5056              | 2498.8604              |
| 4931.8516     | 6.1594 | 1700 | 4959.4023       | 0.0213         | 0.0026           | 0.2632             | 0.0188          | -63.6300       | -121.0142    | 1.7206          | 1.6597        | 1.6597                     | 1.7206                       | -121.0142                 | -63.6300                    | -123.1481                    | -63.8871                       | 2.1339                           | 0.2570                        | 2381.4475              | 2532.8616              |
| 4953.9797     | 6.5217 | 1800 | 4962.0317       | 0.0210         | 0.0004           | 0.2895             | 0.0206          | -63.8433       | -121.0445    | 1.7201          | 1.6602        | 1.6602                     | 1.7201                       | -121.0445                 | -63.8433                    | -123.1481                    | -63.8871                       | 2.1036                           | 0.0438                        | 2382.3406              | 2504.5334              |
| 4965.893      | 6.8841 | 1900 | 4953.7192       | 0.0187         | 0.0005           | 0.3289             | 0.0182          | -63.8390       | -121.2794    | 1.7207          | 1.6619        | 1.6619                     | 1.7207                       | -121.2794                 | -63.8390                    | -123.1481                    | -63.8871                       | 1.8687                           | 0.0481                        | 2383.2534              | 2505.0400              |
| 4950.5336     | 7.2464 | 2000 | 4958.1733       | 0.0211         | 0.0004           | 0.3158             | 0.0207          | -63.8483       | -121.0380    | 1.7193          | 1.6611        | 1.6611                     | 1.7193                       | -121.0380                 | -63.8483                    | -123.1481                    | -63.8871                       | 2.1101                           | 0.0387                        | 2382.7937              | 2504.2783              |
| 4966.3176     | 7.6087 | 2100 | 4951.5176       | 0.0195         | -0.0005          | 0.3816             | 0.0200          | -63.9397       | -121.2030    | 1.7190          | 1.6607        | 1.6607                     | 1.7190                       | -121.2030                 | -63.9397                    | -123.1481                    | -63.8871                       | 1.9451                           | -0.0526                       | 2381.8259              | 2494.8140              |
| 4946.1824     | 7.9710 | 2200 | 4957.3081       | 0.0206         | -0.0002          | 0.3026             | 0.0208          | -63.9058       | -121.0837    | 1.7198          | 1.6603        | 1.6603                     | 1.7198                       | -121.0837                 | -63.9058                    | -123.1481                    | -63.8871                       | 2.0643                           | -0.0187                       | 2387.4246              | 2498.1609              |


### Framework versions

- Transformers 4.42.0
- Pytorch 2.3.0+cu121
- Datasets 2.14.6
- Tokenizers 0.19.1