internlm
/

internlm2_5-7b

@@ -33,15 +33,14 @@ While maintaining the InternLM2 architecture, various new technical explorations
 We have evaluated InternLM2.5 on several important benchmarks using the open-source evaluation tool [OpenCompass](https://github.com/open-compass/opencompass). Some of the evaluation results are shown in the table below. You are welcome to visit the [OpenCompass Leaderboard](https://opencompass.org.cn/rank) for more evaluation results.
-| Benchmark       | InternLM2.5-7B | InternLM2-7B | LLaMA3-8B | Yi-1.5-9B |
-|-----------------|----------------|--------------|-----------|-----------|
-| MMLU            | 71.6           | 65.8         | 66.4      | 71.6      |
-| CMMLU           | 79.1           | 66.2         | 51.0      | 74.1      |
-| BBH             | 70.1           | 65.0         | 59.7      | 71.1      |
-| GSM8K           | 74.8           | 70.8         | 54.3      | 74.5      |
-| MATH            | 34.0           | 20.2         | 16.4      | 31.9      |
-| HumanEval       | 59.8           | 43.3         | 28.1      | 61.0      |
-| MBPP(Sanitized) | 54.9           | 51.8         | 54.9      | 58.8      |
 - The evaluation results were obtained from [OpenCompass](https://github.com/open-compass/opencompass) , and evaluation configuration can be found in the configuration files provided by [OpenCompass](https://github.com/open-compass/opencompass).
 - The evaluation data may have numerical differences due to the version iteration of [OpenCompass](https://github.com/open-compass/opencompass), so please refer to the latest evaluation results of [OpenCompass](https://github.com/open-compass/opencompass).
@@ -99,15 +98,14 @@ The code is licensed under Apache-2.0, while model weights are fully open for ac
 我们使用开源评测工具 [OpenCompass](https://github.com/internLM/OpenCompass/) 对 InternLM2.5 在几个重要的评测集进行了评测 ，部分评测结果如下表所示，欢迎访问[ OpenCompass 榜单 ](https://opencompass.org.cn/rank)获取更多的评测结果。
-| 评测集       | InternLM2.5-7B | InternLM2-7B | LLaMA3-8B | Yi-1.5-9B |
-|-----------------|----------------|--------------|-----------|-----------|
-| MMLU            | 71.6           | 65.8         | 66.4      | 71.6      |
-| CMMLU           | 79.1           | 66.2         | 51.0      | 74.1      |
-| BBH             | 70.1           | 65.0         | 59.7      | 71.1      |
-| GSM8K           | 74.8           | 70.8         | 54.3      | 74.5      |
-| MATH            | 34.0           | 20.2         | 16.4      | 31.9      |
-| HumanEval       | 59.8           | 43.3         | 28.1      | 61.0      |
-| MBPP(Sanitized) | 54.9           | 51.8         | 54.9      | 58.8      |
 - 以上评测结果基于 [OpenCompass](https://github.com/open-compass/opencompass) 获得（部分数据标注`*`代表数据来自原始论文），具体测试细节可参见 [OpenCompass](https://github.com/open-compass/opencompass) 中提供的配置文件。

 We have evaluated InternLM2.5 on several important benchmarks using the open-source evaluation tool [OpenCompass](https://github.com/open-compass/opencompass). Some of the evaluation results are shown in the table below. You are welcome to visit the [OpenCompass Leaderboard](https://opencompass.org.cn/rank) for more evaluation results.
+| Benchmark | InternLM2.5-7B | InternLM2-7B | LLaMA3-8B | Yi-1.5-9B |
+|-----------|----------------|--------------|-----------|-----------|
+| MMLU      | 71.6           | 65.8         | 66.4      | 71.6      |
+| CMMLU     | 79.1           | 66.2         | 51.0      | 74.1      |
+| BBH       | 70.1           | 65.0         | 59.7      | 71.1      |
+| MATH      | 34.0           | 20.2         | 16.4      | 31.9      |
+| GSM8K     | 74.8           | 70.8         | 54.3      | 74.5      |
+| GPQA      | 31.3           | 28.3         | 31.3      | 27.8      |
 - The evaluation results were obtained from [OpenCompass](https://github.com/open-compass/opencompass) , and evaluation configuration can be found in the configuration files provided by [OpenCompass](https://github.com/open-compass/opencompass).
 - The evaluation data may have numerical differences due to the version iteration of [OpenCompass](https://github.com/open-compass/opencompass), so please refer to the latest evaluation results of [OpenCompass](https://github.com/open-compass/opencompass).
 我们使用开源评测工具 [OpenCompass](https://github.com/internLM/OpenCompass/) 对 InternLM2.5 在几个重要的评测集进行了评测 ，部分评测结果如下表所示，欢迎访问[ OpenCompass 榜单 ](https://opencompass.org.cn/rank)获取更多的评测结果。
+| 评测集 | InternLM2.5-7B | InternLM2-7B | LLaMA3-8B | Yi-1.5-9B |
+|-----------|----------------|--------------|-----------|-----------|
+| MMLU      | 71.6           | 65.8         | 66.4      | 71.6      |
+| CMMLU     | 79.1           | 66.2         | 51.0      | 74.1      |
+| BBH       | 70.1           | 65.0         | 59.7      | 71.1      |
+| MATH      | 34.0           | 20.2         | 16.4      | 31.9      |
+| GSM8K     | 74.8           | 70.8         | 54.3      | 74.5      |
+| GPQA      | 31.3           | 28.3         | 31.3      | 27.8      |
 - 以上评测结果基于 [OpenCompass](https://github.com/open-compass/opencompass) 获得（部分数据标注`*`代表数据来自原始论文），具体测试细节可参见 [OpenCompass](https://github.com/open-compass/opencompass) 中提供的配置文件。