Update README.md
Browse files
README.md
CHANGED
@@ -36,7 +36,7 @@ It is impratical for us to manually set specific configurations for each fine-tu
|
|
36 |
Therefore, OpenCSG racked their brains to provide a relatively fair method to compare the fine-tuned models on the HumanEval benchmark.
|
37 |
To simplify the comparison, we chosed the Pass@1 metric for the Python language, but our fine-tuning dataset includes samples in multiple languages.
|
38 |
|
39 |
-
**For fairness, we evaluated the original and fine-tuned
|
40 |
|
41 |
**Otherwise, we use the greedy decoding method for each model during evaluation.**
|
42 |
|
@@ -142,7 +142,7 @@ HumanEval 是评估模型在代码生成方面性能的最常见的基准,尤
|
|
142 |
因此,OpenCSG 提供了一个相对公平的方法来在 HumanEval 基准上比较各微调模型。
|
143 |
方便起见,我们选择了Python语言Pass@1指标,但要注意的是,我们的微调数据集是包含多种编程语言。
|
144 |
|
145 |
-
**为了公平起见,我们仅根据原始问题的提示来评估原始和微调过的
|
146 |
|
147 |
**除此之外,我们在评估过程中对每个模型都使用贪婪解码方法。**
|
148 |
|
|
|
36 |
Therefore, OpenCSG racked their brains to provide a relatively fair method to compare the fine-tuned models on the HumanEval benchmark.
|
37 |
To simplify the comparison, we chosed the Pass@1 metric for the Python language, but our fine-tuning dataset includes samples in multiple languages.
|
38 |
|
39 |
+
**For fairness, we evaluated the original and fine-tuned StarCoder models based only on the prompts from the original cases, without including any other instructions.**
|
40 |
|
41 |
**Otherwise, we use the greedy decoding method for each model during evaluation.**
|
42 |
|
|
|
142 |
因此,OpenCSG 提供了一个相对公平的方法来在 HumanEval 基准上比较各微调模型。
|
143 |
方便起见,我们选择了Python语言Pass@1指标,但要注意的是,我们的微调数据集是包含多种编程语言。
|
144 |
|
145 |
+
**为了公平起见,我们仅根据原始问题的提示来评估原始和微调过的 StarCoder 模型,不包含任何其他说明。**
|
146 |
|
147 |
**除此之外,我们在评估过程中对每个模型都使用贪婪解码方法。**
|
148 |
|