upstage
/

SOLAR-10.7B-Instruct-v1.0

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

killawhale2 commited on Dec 18, 2023

Commit

0dfb32e

·

1 Parent(s): 12acc01

Update README.md

Files changed (1) hide show

README.md +20 -0

README.md CHANGED Viewed

@@ -56,6 +56,26 @@ Using the datasets mentioned above, we applied SFT and iterative DPO training, a
 [2] Yu, L., Jiang, W., Shi, H., Yu, J., Liu, Z., Zhang, Y., Kwok, J.T., Li, Z., Weller, A. and Liu, W., 2023. Metamath: Bootstrap your own mathematical questions for large language models. arXiv preprint arXiv:2309.12284.
 # **Evaluation Results**
 | Model                                  | H6    | Model Size |

 [2] Yu, L., Jiang, W., Shi, H., Yu, J., Liu, Z., Zhang, Y., Kwok, J.T., Li, Z., Weller, A. and Liu, W., 2023. Metamath: Bootstrap your own mathematical questions for large language models. arXiv preprint arXiv:2309.12284.
+# **Data Contamination Test Results**
+Recently, there have been contamination issues in some models on the LLM leaderboard.
+We note that we made every effort to exclude any benchmark-related datasets from training.
+We also ensured the integrity of our model by conducting a data contamination test [3] that is also used by the HuggingFace team [4, 5].
+Our results, with `result < 0.1, %:` being well below 0.9, indicate that our model is free from contamination.
+*The data contamination test results of HellaSwag and Winograde will be added once [3] supports them.*
+| Model                        | ARC   | MMLU | TruthfulQA | GSM8K |
+|------------------------------|-------|-------|-------|-------|
+| **SOLAR-10.7B-Instruct-v1.0**| result < 0.1, %: 0.06 |result < 0.1, %: 0.15 | result < 0.1, %: 0.28 | result < 0.1, %: 0.70 |
+[3] https://github.com/swj0419/detect-pretrain-code-contamination
+[4] https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard/discussions/474#657f2245365456e362412a06
+[5] https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard/discussions/265#657b6debf81f6b44b8966230
 # **Evaluation Results**
 | Model                                  | H6    | Model Size |