upstage
/

SOLAR-10.7B-Instruct-v1.0

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

killawhale2 commited on Dec 13, 2023

Commit

d9aae8f

•

1 Parent(s): ea64f23

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -16,10 +16,10 @@ We developed the Depth Up-Scaling technique. Built on the Llama2 architecture, S
 Depth-Upscaled SOLAR-10.7B has remarkable performance. It outperforms models with up to 30B parameters, even surpassing the recent Mixtral 8X7B model. For detailed information, please refer to the experimental table ([link to be updated soon]).
 Solar 10.7B is an ideal choice for fine-tuning. SOLAR-10.7B offers robustness and adaptability for your fine-tuning needs. Our simple instruction fine-tuning using the SOLAR-10.7B pre-trained model yields significant performance improvements. [[link to be updated soon]]
-# **Training Strategy**
 We utilize state-of-the-art instruction fine-tuning methods including supervised fine-tuning (SFT) and direct preference optimization (DPO) [1].
-Using open source datasets with Alpaca- and OpenOrca-style and generated  synthetic datasets, we apply an iterative DPO training, a proprietary alignment strategy, to maximize the performance of our resulting model.
 [1] Rafailov, R., Sharma, A., Mitchell, E., Ermon, S., Manning, C.D. and Finn, C., 2023. Direct preference optimization: Your language model is secretly a reward model. arXiv preprint arXiv:2305.18290.

 Depth-Upscaled SOLAR-10.7B has remarkable performance. It outperforms models with up to 30B parameters, even surpassing the recent Mixtral 8X7B model. For detailed information, please refer to the experimental table ([link to be updated soon]).
 Solar 10.7B is an ideal choice for fine-tuning. SOLAR-10.7B offers robustness and adaptability for your fine-tuning needs. Our simple instruction fine-tuning using the SOLAR-10.7B pre-trained model yields significant performance improvements. [[link to be updated soon]]
+# **Instruction Fine-Tuning Strategy**
 We utilize state-of-the-art instruction fine-tuning methods including supervised fine-tuning (SFT) and direct preference optimization (DPO) [1].
+Using open source datasets with Alpaca- and OpenOrca-style and generated  synthetic datasets, we apply iterative DPO training, a proprietary alignment strategy, to maximize the performance of our resulting model.
 [1] Rafailov, R., Sharma, A., Mitchell, E., Ermon, S., Manning, C.D. and Finn, C., 2023. Direct preference optimization: Your language model is secretly a reward model. arXiv preprint arXiv:2305.18290.