ds_chat_sppo_hard_cosine_iter0_2024-09-16-16.38
This model is a fine-tuned version of deepseek-ai/deepseek-llm-7b-chat on the self-generate/ds_chat_original_cn_mining_oj_iter0-binarized, the self-generate/ds_chat_original_cn_mining_sandbox_iter0-binarized and the self-generate/ds_chat_original_cn_rl_oj_iter0-binarized datasets. It achieves the following results on the evaluation set:
- Loss: 4957.3081
- Rewards/chosen: 0.0206
- Rewards/rejected: -0.0002
- Rewards/accuracies: 0.3026
- Rewards/margins: 0.0208
- Logps/rejected: -63.9058
- Logps/chosen: -121.0837
- Logits/rejected: 1.7198
- Logits/chosen: 1.6603
- Debug/policy Chosen Logits: 1.6603
- Debug/policy Rejected Logits: 1.7198
- Debug/policy Chosen Logps: -121.0837
- Debug/policy Rejected Logps: -63.9058
- Debug/reference Chosen Logps: -123.1481
- Debug/reference Rejected Logps: -63.8871
- Debug/sppo Chosen Reward In Loss: 2.0643
- Debug/sppo Rej Reward In Loss: -0.0187
- Debug/sppo Chosen Loss: 2387.4246
- Debug/sppo Reject Loss: 2498.1609
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-07
- train_batch_size: 8
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- total_train_batch_size: 64
- total_eval_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- lr_scheduler_warmup_steps: 100
- num_epochs: 8.0
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | Debug/policy Chosen Logits | Debug/policy Rejected Logits | Debug/policy Chosen Logps | Debug/policy Rejected Logps | Debug/reference Chosen Logps | Debug/reference Rejected Logps | Debug/sppo Chosen Reward In Loss | Debug/sppo Rej Reward In Loss | Debug/sppo Chosen Loss | Debug/sppo Reject Loss |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
4999.5461 | 0.3623 | 100 | 4988.0952 | 0.0050 | 0.0020 | 0.2763 | 0.0031 | -63.6883 | -122.6432 | 1.7269 | 1.6642 | 1.6642 | 1.7269 | -122.6432 | -63.6883 | -123.1481 | -63.8871 | 0.5049 | 0.1988 | 2453.1523 | 2523.2144 |
5011.4531 | 0.7246 | 200 | 4990.5610 | 0.0177 | 0.0058 | 0.3158 | 0.0119 | -63.3097 | -121.3786 | 1.7330 | 1.6732 | 1.6732 | 1.7330 | -121.3786 | -63.3097 | -123.1481 | -63.8871 | 1.7695 | 0.5774 | 2386.0396 | 2582.6948 |
4987.3762 | 1.0870 | 300 | 4987.7910 | 0.0199 | 0.0061 | 0.2632 | 0.0137 | -63.2725 | -121.1585 | 1.7421 | 1.6830 | 1.6830 | 1.7421 | -121.1585 | -63.2725 | -123.1481 | -63.8871 | 1.9895 | 0.6145 | 2385.2695 | 2590.7976 |
5014.9531 | 1.4493 | 400 | 4983.8423 | 0.0200 | 0.0047 | 0.2632 | 0.0152 | -63.4148 | -121.1519 | 1.7308 | 1.6711 | 1.6711 | 1.7308 | -121.1519 | -63.4148 | -123.1481 | -63.8871 | 1.9962 | 0.4722 | 2383.6707 | 2565.9753 |
5006.941 | 1.8116 | 500 | 4965.4326 | 0.0117 | -0.0005 | 0.3158 | 0.0122 | -63.9328 | -121.9733 | 1.7113 | 1.6503 | 1.6503 | 1.7113 | -121.9733 | -63.9328 | -123.1481 | -63.8871 | 1.1748 | -0.0457 | 2416.3770 | 2495.6252 |
4945.2656 | 2.1739 | 600 | 4971.4199 | 0.0165 | 0.0030 | 0.2632 | 0.0134 | -63.5826 | -121.4996 | 1.7310 | 1.6724 | 1.6724 | 1.7310 | -121.4996 | -63.5826 | -123.1481 | -63.8871 | 1.6485 | 0.3045 | 2391.6709 | 2537.9797 |
5016.1723 | 2.5362 | 700 | 4956.6055 | 0.0193 | 0.0038 | 0.3684 | 0.0155 | -63.5097 | -121.2218 | 1.7528 | 1.6919 | 1.6919 | 1.7528 | -121.2218 | -63.5097 | -123.1481 | -63.8871 | 1.9263 | 0.3774 | 2372.3936 | 2549.7046 |
4980.475 | 2.8986 | 800 | 4967.6992 | 0.0217 | 0.0048 | 0.3421 | 0.0169 | -63.4108 | -120.9796 | 1.7533 | 1.6937 | 1.6937 | 1.7533 | -120.9796 | -63.4108 | -123.1481 | -63.8871 | 2.1685 | 0.4763 | 2370.3362 | 2566.8535 |
4962.825 | 3.2609 | 900 | 4973.9316 | 0.0239 | 0.0047 | 0.3026 | 0.0192 | -63.4168 | -120.7541 | 1.7347 | 1.6754 | 1.6754 | 1.7347 | -120.7541 | -63.4168 | -123.1481 | -63.8871 | 2.3940 | 0.4702 | 2374.9814 | 2564.9277 |
4960.6797 | 3.6232 | 1000 | 4954.9062 | 0.0185 | 0.0027 | 0.3553 | 0.0158 | -63.6219 | -121.2982 | 1.7363 | 1.6773 | 1.6773 | 1.7363 | -121.2982 | -63.6219 | -123.1481 | -63.8871 | 1.8498 | 0.2651 | 2376.7742 | 2531.5662 |
4996.0746 | 3.9855 | 1100 | 4978.2021 | 0.0089 | -0.0022 | 0.3684 | 0.0112 | -64.1119 | -122.2532 | 1.6884 | 1.6291 | 1.6291 | 1.6884 | -122.2532 | -64.1119 | -123.1481 | -63.8871 | 0.8949 | -0.2249 | 2438.2773 | 2479.8074 |
4988.032 | 4.3478 | 1200 | 4952.4019 | 0.0171 | -0.0003 | 0.3816 | 0.0174 | -63.9132 | -121.4333 | 1.7223 | 1.6634 | 1.6634 | 1.7223 | -121.4333 | -63.9132 | -123.1481 | -63.8871 | 1.7148 | -0.0261 | 2381.5840 | 2497.4338 |
4982.1008 | 4.7101 | 1300 | 4951.4316 | 0.0171 | -0.0003 | 0.3553 | 0.0174 | -63.9127 | -121.4370 | 1.7192 | 1.6602 | 1.6602 | 1.7192 | -121.4370 | -63.9127 | -123.1481 | -63.8871 | 1.7111 | -0.0257 | 2388.1934 | 2497.4824 |
4966.7375 | 5.0725 | 1400 | 4954.5615 | 0.0185 | 0.0008 | 0.3289 | 0.0177 | -63.8112 | -121.3000 | 1.7216 | 1.6631 | 1.6631 | 1.7216 | -121.3000 | -63.8112 | -123.1481 | -63.8871 | 1.8480 | 0.0759 | 2383.4727 | 2508.1672 |
4937.6176 | 5.4348 | 1500 | 4952.7949 | 0.0157 | -0.0019 | 0.3289 | 0.0176 | -64.0738 | -121.5761 | 1.7099 | 1.6508 | 1.6508 | 1.7099 | -121.5761 | -64.0738 | -123.1481 | -63.8871 | 1.5720 | -0.1868 | 2396.6667 | 2483.3738 |
4969.5398 | 5.7971 | 1600 | 4948.7925 | 0.0184 | -0.0001 | 0.3289 | 0.0186 | -63.8999 | -121.3049 | 1.7190 | 1.6601 | 1.6601 | 1.7190 | -121.3049 | -63.8999 | -123.1481 | -63.8871 | 1.8432 | -0.0128 | 2383.5056 | 2498.8604 |
4931.8516 | 6.1594 | 1700 | 4959.4023 | 0.0213 | 0.0026 | 0.2632 | 0.0188 | -63.6300 | -121.0142 | 1.7206 | 1.6597 | 1.6597 | 1.7206 | -121.0142 | -63.6300 | -123.1481 | -63.8871 | 2.1339 | 0.2570 | 2381.4475 | 2532.8616 |
4953.9797 | 6.5217 | 1800 | 4962.0317 | 0.0210 | 0.0004 | 0.2895 | 0.0206 | -63.8433 | -121.0445 | 1.7201 | 1.6602 | 1.6602 | 1.7201 | -121.0445 | -63.8433 | -123.1481 | -63.8871 | 2.1036 | 0.0438 | 2382.3406 | 2504.5334 |
4965.893 | 6.8841 | 1900 | 4953.7192 | 0.0187 | 0.0005 | 0.3289 | 0.0182 | -63.8390 | -121.2794 | 1.7207 | 1.6619 | 1.6619 | 1.7207 | -121.2794 | -63.8390 | -123.1481 | -63.8871 | 1.8687 | 0.0481 | 2383.2534 | 2505.0400 |
4950.5336 | 7.2464 | 2000 | 4958.1733 | 0.0211 | 0.0004 | 0.3158 | 0.0207 | -63.8483 | -121.0380 | 1.7193 | 1.6611 | 1.6611 | 1.7193 | -121.0380 | -63.8483 | -123.1481 | -63.8871 | 2.1101 | 0.0387 | 2382.7937 | 2504.2783 |
4966.3176 | 7.6087 | 2100 | 4951.5176 | 0.0195 | -0.0005 | 0.3816 | 0.0200 | -63.9397 | -121.2030 | 1.7190 | 1.6607 | 1.6607 | 1.7190 | -121.2030 | -63.9397 | -123.1481 | -63.8871 | 1.9451 | -0.0526 | 2381.8259 | 2494.8140 |
4946.1824 | 7.9710 | 2200 | 4957.3081 | 0.0206 | -0.0002 | 0.3026 | 0.0208 | -63.9058 | -121.0837 | 1.7198 | 1.6603 | 1.6603 | 1.7198 | -121.0837 | -63.9058 | -123.1481 | -63.8871 | 2.0643 | -0.0187 | 2387.4246 | 2498.1609 |
Framework versions
- Transformers 4.42.0
- Pytorch 2.3.0+cu121
- Datasets 2.14.6
- Tokenizers 0.19.1
- Downloads last month
- 0
Model tree for yiran-wang3/ds_chat_sppo_hard_cosine_iter0_masked_cosine_schedule
Base model
deepseek-ai/deepseek-llm-7b-chat