x54-729 commited on
Commit
7a87356
1 Parent(s): 9b45de1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -18
README.md CHANGED
@@ -33,15 +33,14 @@ While maintaining the InternLM2 architecture, various new technical explorations
33
 
34
  We have evaluated InternLM2.5 on several important benchmarks using the open-source evaluation tool [OpenCompass](https://github.com/open-compass/opencompass). Some of the evaluation results are shown in the table below. You are welcome to visit the [OpenCompass Leaderboard](https://opencompass.org.cn/rank) for more evaluation results.
35
 
36
- | Benchmark | InternLM2.5-7B | InternLM2-7B | LLaMA3-8B | Yi-1.5-9B |
37
- |-----------------|----------------|--------------|-----------|-----------|
38
- | MMLU | 71.6 | 65.8 | 66.4 | 71.6 |
39
- | CMMLU | 79.1 | 66.2 | 51.0 | 74.1 |
40
- | BBH | 70.1 | 65.0 | 59.7 | 71.1 |
41
- | GSM8K | 74.8 | 70.8 | 54.3 | 74.5 |
42
- | MATH | 34.0 | 20.2 | 16.4 | 31.9 |
43
- | HumanEval | 59.8 | 43.3 | 28.1 | 61.0 |
44
- | MBPP(Sanitized) | 54.9 | 51.8 | 54.9 | 58.8 |
45
 
46
  - The evaluation results were obtained from [OpenCompass](https://github.com/open-compass/opencompass) , and evaluation configuration can be found in the configuration files provided by [OpenCompass](https://github.com/open-compass/opencompass).
47
  - The evaluation data may have numerical differences due to the version iteration of [OpenCompass](https://github.com/open-compass/opencompass), so please refer to the latest evaluation results of [OpenCompass](https://github.com/open-compass/opencompass).
@@ -99,15 +98,14 @@ The code is licensed under Apache-2.0, while model weights are fully open for ac
99
 
100
  我们使用开源评测工具 [OpenCompass](https://github.com/internLM/OpenCompass/) 对 InternLM2.5 在几个重要的评测集进行了评测 ,部分评测结果如下表所示,欢迎访问[ OpenCompass 榜单 ](https://opencompass.org.cn/rank)获取更多的评测结果。
101
 
102
- | 评测集 | InternLM2.5-7B | InternLM2-7B | LLaMA3-8B | Yi-1.5-9B |
103
- |-----------------|----------------|--------------|-----------|-----------|
104
- | MMLU | 71.6 | 65.8 | 66.4 | 71.6 |
105
- | CMMLU | 79.1 | 66.2 | 51.0 | 74.1 |
106
- | BBH | 70.1 | 65.0 | 59.7 | 71.1 |
107
- | GSM8K | 74.8 | 70.8 | 54.3 | 74.5 |
108
- | MATH | 34.0 | 20.2 | 16.4 | 31.9 |
109
- | HumanEval | 59.8 | 43.3 | 28.1 | 61.0 |
110
- | MBPP(Sanitized) | 54.9 | 51.8 | 54.9 | 58.8 |
111
 
112
 
113
  - 以上评测结果基于 [OpenCompass](https://github.com/open-compass/opencompass) 获得(部分数据标注`*`代表数据来自原始论文),具体测试细节可参见 [OpenCompass](https://github.com/open-compass/opencompass) 中提供的配置文件。
 
33
 
34
  We have evaluated InternLM2.5 on several important benchmarks using the open-source evaluation tool [OpenCompass](https://github.com/open-compass/opencompass). Some of the evaluation results are shown in the table below. You are welcome to visit the [OpenCompass Leaderboard](https://opencompass.org.cn/rank) for more evaluation results.
35
 
36
+ | Benchmark | InternLM2.5-7B | InternLM2-7B | LLaMA3-8B | Yi-1.5-9B |
37
+ |-----------|----------------|--------------|-----------|-----------|
38
+ | MMLU | 71.6 | 65.8 | 66.4 | 71.6 |
39
+ | CMMLU | 79.1 | 66.2 | 51.0 | 74.1 |
40
+ | BBH | 70.1 | 65.0 | 59.7 | 71.1 |
41
+ | MATH | 34.0 | 20.2 | 16.4 | 31.9 |
42
+ | GSM8K | 74.8 | 70.8 | 54.3 | 74.5 |
43
+ | GPQA | 31.3 | 28.3 | 31.3 | 27.8 |
 
44
 
45
  - The evaluation results were obtained from [OpenCompass](https://github.com/open-compass/opencompass) , and evaluation configuration can be found in the configuration files provided by [OpenCompass](https://github.com/open-compass/opencompass).
46
  - The evaluation data may have numerical differences due to the version iteration of [OpenCompass](https://github.com/open-compass/opencompass), so please refer to the latest evaluation results of [OpenCompass](https://github.com/open-compass/opencompass).
 
98
 
99
  我们使用开源评测工具 [OpenCompass](https://github.com/internLM/OpenCompass/) 对 InternLM2.5 在几个重要的评测集进行了评测 ,部分评测结果如下表所示,欢迎访问[ OpenCompass 榜单 ](https://opencompass.org.cn/rank)获取更多的评测结果。
100
 
101
+ | 评测集 | InternLM2.5-7B | InternLM2-7B | LLaMA3-8B | Yi-1.5-9B |
102
+ |-----------|----------------|--------------|-----------|-----------|
103
+ | MMLU | 71.6 | 65.8 | 66.4 | 71.6 |
104
+ | CMMLU | 79.1 | 66.2 | 51.0 | 74.1 |
105
+ | BBH | 70.1 | 65.0 | 59.7 | 71.1 |
106
+ | MATH | 34.0 | 20.2 | 16.4 | 31.9 |
107
+ | GSM8K | 74.8 | 70.8 | 54.3 | 74.5 |
108
+ | GPQA | 31.3 | 28.3 | 31.3 | 27.8 |
 
109
 
110
 
111
  - 以上评测结果基于 [OpenCompass](https://github.com/open-compass/opencompass) 获得(部分数据标注`*`代表数据来自原始论文),具体测试细节可参见 [OpenCompass](https://github.com/open-compass/opencompass) 中提供的配置文件。