yutaozhu94
commited on
Commit
•
85959c2
1
Parent(s):
2d43918
Update README.md
Browse files
README.md
CHANGED
@@ -35,9 +35,9 @@ Due to the license limitation, for models based on LLaMA, we only provide the we
|
|
35 |
|
36 |
## Evaluation
|
37 |
|
38 |
-
We evaluate our YuLan-Chat model on several Chinese and English benchmarks. The evaluation results are shown as follows.
|
39 |
|
40 |
-
> 我们在中英文的一些基准测试上对YuLan-Chat
|
41 |
|
42 |
### MMLU
|
43 |
|
@@ -47,8 +47,8 @@ We evaluate our YuLan-Chat model on several Chinese and English benchmarks. The
|
|
47 |
|
48 |
| Model | STEM | Social Science | Humanities | Others | Avg. |
|
49 |
| --------------------------------- | :--: | :------------: | :--------: | :----: | :--: |
|
50 |
-
| YuLan-Chat-1-13B-v1 | |
|
51 |
-
| YuLan-Chat-1-65B-v1 | |
|
52 |
| YuLan-Chat-1-65B-v2 | 46.3 | 67.9 | 56.9 | 63.9 | 58.7 |
|
53 |
| LLaMA-2-13B | 44.6 | 64.2 | 53.9 | 62.2 | 56.2 |
|
54 |
| FlagAlpha/Llama2-Chinese-13b-Chat | 44.4 | 63.2 | 51.6 | 60.6 | 55.0 |
|
@@ -63,8 +63,8 @@ We evaluate our YuLan-Chat model on several Chinese and English benchmarks. The
|
|
63 |
|
64 |
| Model | STEM | Social Science | Humanities | Others | Avg. | Avg. (Hard) |
|
65 |
| --------------------------------- | :--: | :------------: | :--------: | :----: | :--: | :---------: |
|
66 |
-
| YuLan-Chat-1-13B-v1 | |
|
67 |
-
| YuLan-Chat-1-65B-v1 | 37.
|
68 |
| YuLan-Chat-1-65B-v2 | 39.9 | 55.9 | 47.7 | 43.7 | 45.4 | 31.4 |
|
69 |
| LLaMA-2-13B | 36.9 | 43.2 | 37.6 | 36.6 | 38.2 | 32.0 |
|
70 |
| FlagAlpha/Llama2-Chinese-13b-Chat | 36.8 | 44.5 | 36.3 | 36.5 | 38.1 | 30.9 |
|
|
|
35 |
|
36 |
## Evaluation
|
37 |
|
38 |
+
We evaluate our YuLan-Chat model on several Chinese and English benchmarks. The evaluation results are shown as follows.
|
39 |
|
40 |
+
> 我们在中英文的一些基准测试上对YuLan-Chat进行了评价,其结果如下。
|
41 |
|
42 |
### MMLU
|
43 |
|
|
|
47 |
|
48 |
| Model | STEM | Social Science | Humanities | Others | Avg. |
|
49 |
| --------------------------------- | :--: | :------------: | :--------: | :----: | :--: |
|
50 |
+
| YuLan-Chat-1-13B-v1 | 39.6 | 57.8 | 42.6 | 57.6 | 49.4 |
|
51 |
+
| YuLan-Chat-1-65B-v1 | 49.2 | 71.7 | 57.7 | 66.7 | 61.3 |
|
52 |
| YuLan-Chat-1-65B-v2 | 46.3 | 67.9 | 56.9 | 63.9 | 58.7 |
|
53 |
| LLaMA-2-13B | 44.6 | 64.2 | 53.9 | 62.2 | 56.2 |
|
54 |
| FlagAlpha/Llama2-Chinese-13b-Chat | 44.4 | 63.2 | 51.6 | 60.6 | 55.0 |
|
|
|
63 |
|
64 |
| Model | STEM | Social Science | Humanities | Others | Avg. | Avg. (Hard) |
|
65 |
| --------------------------------- | :--: | :------------: | :--------: | :----: | :--: | :---------: |
|
66 |
+
| YuLan-Chat-1-13B-v1 | 30.2 | 37.4 | 31.9 | 30.7 | 32.0 | 25.7 |
|
67 |
+
| YuLan-Chat-1-65B-v1 | 37.7 | 46.1 | 36.8 | 38.0 | 39.2 | 31.1 |
|
68 |
| YuLan-Chat-1-65B-v2 | 39.9 | 55.9 | 47.7 | 43.7 | 45.4 | 31.4 |
|
69 |
| LLaMA-2-13B | 36.9 | 43.2 | 37.6 | 36.6 | 38.2 | 32.0 |
|
70 |
| FlagAlpha/Llama2-Chinese-13b-Chat | 36.8 | 44.5 | 36.3 | 36.5 | 38.1 | 30.9 |
|