Update README.md
Browse files
README.md
CHANGED
@@ -33,15 +33,14 @@ While maintaining the InternLM2 architecture, various new technical explorations
|
|
33 |
|
34 |
We have evaluated InternLM2.5 on several important benchmarks using the open-source evaluation tool [OpenCompass](https://github.com/open-compass/opencompass). Some of the evaluation results are shown in the table below. You are welcome to visit the [OpenCompass Leaderboard](https://opencompass.org.cn/rank) for more evaluation results.
|
35 |
|
36 |
-
| Benchmark
|
37 |
-
|
38 |
-
| MMLU
|
39 |
-
| CMMLU
|
40 |
-
| BBH
|
41 |
-
|
|
42 |
-
|
|
43 |
-
|
|
44 |
-
| MBPP(Sanitized) | 54.9 | 51.8 | 54.9 | 58.8 |
|
45 |
|
46 |
- The evaluation results were obtained from [OpenCompass](https://github.com/open-compass/opencompass) , and evaluation configuration can be found in the configuration files provided by [OpenCompass](https://github.com/open-compass/opencompass).
|
47 |
- The evaluation data may have numerical differences due to the version iteration of [OpenCompass](https://github.com/open-compass/opencompass), so please refer to the latest evaluation results of [OpenCompass](https://github.com/open-compass/opencompass).
|
@@ -99,15 +98,14 @@ The code is licensed under Apache-2.0, while model weights are fully open for ac
|
|
99 |
|
100 |
我们使用开源评测工具 [OpenCompass](https://github.com/internLM/OpenCompass/) 对 InternLM2.5 在几个重要的评测集进行了评测 ,部分评测结果如下表所示,欢迎访问[ OpenCompass 榜单 ](https://opencompass.org.cn/rank)获取更多的评测结果。
|
101 |
|
102 |
-
| 评测集
|
103 |
-
|
104 |
-
| MMLU
|
105 |
-
| CMMLU
|
106 |
-
| BBH
|
107 |
-
|
|
108 |
-
|
|
109 |
-
|
|
110 |
-
| MBPP(Sanitized) | 54.9 | 51.8 | 54.9 | 58.8 |
|
111 |
|
112 |
|
113 |
- 以上评测结果基于 [OpenCompass](https://github.com/open-compass/opencompass) 获得(部分数据标注`*`代表数据来自原始论文),具体测试细节可参见 [OpenCompass](https://github.com/open-compass/opencompass) 中提供的配置文件。
|
|
|
33 |
|
34 |
We have evaluated InternLM2.5 on several important benchmarks using the open-source evaluation tool [OpenCompass](https://github.com/open-compass/opencompass). Some of the evaluation results are shown in the table below. You are welcome to visit the [OpenCompass Leaderboard](https://opencompass.org.cn/rank) for more evaluation results.
|
35 |
|
36 |
+
| Benchmark | InternLM2.5-7B | InternLM2-7B | LLaMA3-8B | Yi-1.5-9B |
|
37 |
+
|-----------|----------------|--------------|-----------|-----------|
|
38 |
+
| MMLU | 71.6 | 65.8 | 66.4 | 71.6 |
|
39 |
+
| CMMLU | 79.1 | 66.2 | 51.0 | 74.1 |
|
40 |
+
| BBH | 70.1 | 65.0 | 59.7 | 71.1 |
|
41 |
+
| MATH | 34.0 | 20.2 | 16.4 | 31.9 |
|
42 |
+
| GSM8K | 74.8 | 70.8 | 54.3 | 74.5 |
|
43 |
+
| GPQA | 31.3 | 28.3 | 31.3 | 27.8 |
|
|
|
44 |
|
45 |
- The evaluation results were obtained from [OpenCompass](https://github.com/open-compass/opencompass) , and evaluation configuration can be found in the configuration files provided by [OpenCompass](https://github.com/open-compass/opencompass).
|
46 |
- The evaluation data may have numerical differences due to the version iteration of [OpenCompass](https://github.com/open-compass/opencompass), so please refer to the latest evaluation results of [OpenCompass](https://github.com/open-compass/opencompass).
|
|
|
98 |
|
99 |
我们使用开源评测工具 [OpenCompass](https://github.com/internLM/OpenCompass/) 对 InternLM2.5 在几个重要的评测集进行了评测 ,部分评测结果如下表所示,欢迎访问[ OpenCompass 榜单 ](https://opencompass.org.cn/rank)获取更多的评测结果。
|
100 |
|
101 |
+
| 评测集 | InternLM2.5-7B | InternLM2-7B | LLaMA3-8B | Yi-1.5-9B |
|
102 |
+
|-----------|----------------|--------------|-----------|-----------|
|
103 |
+
| MMLU | 71.6 | 65.8 | 66.4 | 71.6 |
|
104 |
+
| CMMLU | 79.1 | 66.2 | 51.0 | 74.1 |
|
105 |
+
| BBH | 70.1 | 65.0 | 59.7 | 71.1 |
|
106 |
+
| MATH | 34.0 | 20.2 | 16.4 | 31.9 |
|
107 |
+
| GSM8K | 74.8 | 70.8 | 54.3 | 74.5 |
|
108 |
+
| GPQA | 31.3 | 28.3 | 31.3 | 27.8 |
|
|
|
109 |
|
110 |
|
111 |
- 以上评测结果基于 [OpenCompass](https://github.com/open-compass/opencompass) 获得(部分数据标注`*`代表数据来自原始论文),具体测试细节可参见 [OpenCompass](https://github.com/open-compass/opencompass) 中提供的配置文件。
|