Update README.md
Browse files
README.md
CHANGED
@@ -34,7 +34,7 @@ Somehow, model evaluation is a kind of metaphysics. Different models are sensiti
|
|
34 |
It is impratical for us to manually set specific configuration for each fine-tuned model, because a real LLM should master the universal capability despite the parameters manipulated by users.
|
35 |
|
36 |
Thus, OpenCSG strained our brains to provide a relatively fair method to compare the fine-tuned models on HumanEval benchmark.
|
37 |
-
To simplify the
|
38 |
|
39 |
**For fair, we evaluated the fine-tuned and origin starcoder models only with the original cases' prompts, not including any other instruction else.**
|
40 |
|
|
|
34 |
It is impratical for us to manually set specific configuration for each fine-tuned model, because a real LLM should master the universal capability despite the parameters manipulated by users.
|
35 |
|
36 |
Thus, OpenCSG strained our brains to provide a relatively fair method to compare the fine-tuned models on HumanEval benchmark.
|
37 |
+
To simplify the comparison, we chosed the Pass@1 metric on python language, but our finetuning dataset includes samples in multi language.
|
38 |
|
39 |
**For fair, we evaluated the fine-tuned and origin starcoder models only with the original cases' prompts, not including any other instruction else.**
|
40 |
|