AmeliaYin commited on
Commit
36642a5
1 Parent(s): 455a03b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -36,7 +36,7 @@ It is impratical for us to manually set specific configurations for each fine-tu
36
  Therefore, OpenCSG racked their brains to provide a relatively fair method to compare the fine-tuned models on the HumanEval benchmark.
37
  To simplify the comparison, we chosed the Pass@1 metric for the Python language, but our fine-tuning dataset includes samples in multiple languages.
38
 
39
- **For fairness, we evaluated the original and fine-tuned CodeLlama models based only on the prompts from the original cases, without including any other instructions.**
40
 
41
  **Otherwise, we use the greedy decoding method for each model during evaluation.**
42
 
@@ -142,7 +142,7 @@ HumanEval 是评估模型在代码生成方面性能的最常见的基准,尤
142
  因此,OpenCSG 提供了一个相对公平的方法来在 HumanEval 基准上比较各微调模型。
143
  方便起见,我们选择了Python语言Pass@1指标,但要注意的是,我们的微调数据集是包含多种编程语言。
144
 
145
- **为了公平起见,我们仅根据原始问题的提示来评估原始和微调过的 CodeLlama 模型,不包含任何其他说明。**
146
 
147
  **除此之外,我们在评估过程中对每个模型都使用贪婪解码方法。**
148
 
 
36
  Therefore, OpenCSG racked their brains to provide a relatively fair method to compare the fine-tuned models on the HumanEval benchmark.
37
  To simplify the comparison, we chosed the Pass@1 metric for the Python language, but our fine-tuning dataset includes samples in multiple languages.
38
 
39
+ **For fairness, we evaluated the original and fine-tuned StarCoder models based only on the prompts from the original cases, without including any other instructions.**
40
 
41
  **Otherwise, we use the greedy decoding method for each model during evaluation.**
42
 
 
142
  因此,OpenCSG 提供了一个相对公平的方法来在 HumanEval 基准上比较各微调模型。
143
  方便起见,我们选择了Python语言Pass@1指标,但要注意的是,我们的微调数据集是包含多种编程语言。
144
 
145
+ **为了公平起见,我们仅根据原始问题的提示来评估原始和微调过的 StarCoder 模型,不包含任何其他说明。**
146
 
147
  **除此之外,我们在评估过程中对每个模型都使用贪婪解码方法。**
148