Update README.md
Browse files
README.md
CHANGED
@@ -95,7 +95,7 @@ Qwen-7B模型规模基本情况如下所示:
|
|
95 |
The details of the model architecture of Qwen-7B are listed as follows:
|
96 |
|
97 |
| Hyperparameter | Value |
|
98 |
-
|
99 |
| n_layers | 32 |
|
100 |
| n_heads | 32 |
|
101 |
| d_model | 4096 |
|
@@ -146,7 +146,7 @@ For pre-training data, on the one hand, Qwen-7B uses part of the open-source gen
|
|
146 |
The accuracy comparison of Qwen-7B and the other models on the C-Eval validation set is shown as follows:
|
147 |
|
148 |
| Model | Avg. |
|
149 |
-
|
150 |
| Alpaca-7B | 28.9 |
|
151 |
| Vicuna-7B | 31.2 |
|
152 |
| ChatGLM-6B | 37.1 |
|
@@ -162,7 +162,7 @@ The accuracy comparison of Qwen-7B and the other models on the C-Eval validation
|
|
162 |
The performance comparison of Qwen-7B and other models on the C-Eval test set is shown in the following table:
|
163 |
|
164 |
| Model | Avg. | Avg. (Hard) | STEM | Social Sciences | Humanities | Others |
|
165 |
-
|
166 |
| ChatGLM-6B | 38.9 | 29.2 | 33.3 | 48.3 | 41.3 | 38.0 |
|
167 |
| Chinese-Alpaca-Plus-13B | 41.5 | 30.5 | 36.6 | 49.7 | 43.1 | 41.2 |
|
168 |
| Baichuan-7B | 42.8 | 31.5 | 38.2 | 52.0 | 46.2 | 39.3 |
|
@@ -191,7 +191,7 @@ Qwen-7B在MMLU 5-shot准确率表现如下表:
|
|
191 |
[MMLU](https://arxiv.org/abs/2009.03300) is currently one of the most recognized benchmarks for evaluating English comprehension abilities, covering 57 subtasks across different academic fields and difficulty levels. The MMLU 5-shot accuracy performance of Qwen-7B is shown in the following table:
|
192 |
|
193 |
| Model | Avg. | STEM | Social Sciences | Humanities | Others |
|
194 |
-
|
195 |
| LLaMA-7B | 35.1 | 30.5 | 38.3 | 34.0 | 38.1 |
|
196 |
| Baichuan-7B | 42.3 | 35.6 | 48.9 | 38.4 | 48.1 |
|
197 |
| LLaMA2-7B | 45.3 | 36.4 | 51.2 | 42.9 | 52.2 |
|
@@ -214,7 +214,7 @@ In terms of English, Qwen-7B also surpasses other similar open-source pre-traine
|
|
214 |
We compared the code capabilities of pre-trained models on [HumanEval](https://github.com/openai/human-eval), and the results are as follows:
|
215 |
|
216 |
| Model | Pass@1 |
|
217 |
-
|
218 |
| Baichuan-7B | 9.2 |
|
219 |
| ChatGLM2-6B | 9.2 |
|
220 |
| InternLM-7B | 10.4 |
|
@@ -233,7 +233,7 @@ We compared the code capabilities of pre-trained models on [HumanEval](https://g
|
|
233 |
We compared the math capabilities of pre-trained models on [GSM8K](https://github.com/openai/grade-school-math) (8-shot), and the results are as follows:
|
234 |
|
235 |
| Model | Acc. |
|
236 |
-
|
237 |
| MPT-7B | 6.8 |
|
238 |
| Falcon-7B | 6.8 |
|
239 |
| Baichuan-7B | 9.7 |
|
@@ -254,7 +254,7 @@ We compared the math capabilities of pre-trained models on [GSM8K](https://githu
|
|
254 |
We compared the translation capabilities of pre-trained models on [WMT22](https://www.statmt.org/wmt22/translation-task.html) zh-en and en-zh (5-shot BLEU), and the results are as follows:
|
255 |
|
256 |
| Model | Avg. | zh-en | en-zh |
|
257 |
-
|
258 |
| InternLM-7B | 11.8 | 9.0 | 14.5 |
|
259 |
| LLaMA-7B | 12.7 | 16.7 | 8.7 |
|
260 |
| LLaMA-13B | 15.8 | 19.5 | 12.0 |
|
@@ -329,7 +329,7 @@ model = AutoModelForCausalLM.from_pretrained(
|
|
329 |
With this method, it is available to load Qwen-7B in `NF4` and `Int8`, which saves you memory usage. We provide related statistics of model performance below. We find that the quantization downgrades the effectiveness slightly but significantly increases inference efficiency and reduces memory costs.
|
330 |
|
331 |
| Precision | MMLU | Memory |
|
332 |
-
|
|
333 |
| BF16 | 56.7 | 16.2G |
|
334 |
| Int8 | 52.8 | 10.1G |
|
335 |
| NF4 | 48.9 | 7.4G |
|
|
|
95 |
The details of the model architecture of Qwen-7B are listed as follows:
|
96 |
|
97 |
| Hyperparameter | Value |
|
98 |
+
|:----------------|:-------|
|
99 |
| n_layers | 32 |
|
100 |
| n_heads | 32 |
|
101 |
| d_model | 4096 |
|
|
|
146 |
The accuracy comparison of Qwen-7B and the other models on the C-Eval validation set is shown as follows:
|
147 |
|
148 |
| Model | Avg. |
|
149 |
+
|:----------------|:--------:|
|
150 |
| Alpaca-7B | 28.9 |
|
151 |
| Vicuna-7B | 31.2 |
|
152 |
| ChatGLM-6B | 37.1 |
|
|
|
162 |
The performance comparison of Qwen-7B and other models on the C-Eval test set is shown in the following table:
|
163 |
|
164 |
| Model | Avg. | Avg. (Hard) | STEM | Social Sciences | Humanities | Others |
|
165 |
+
|:--------------|:------:|:------:|:------:|:------:|:------:|:------:|
|
166 |
| ChatGLM-6B | 38.9 | 29.2 | 33.3 | 48.3 | 41.3 | 38.0 |
|
167 |
| Chinese-Alpaca-Plus-13B | 41.5 | 30.5 | 36.6 | 49.7 | 43.1 | 41.2 |
|
168 |
| Baichuan-7B | 42.8 | 31.5 | 38.2 | 52.0 | 46.2 | 39.3 |
|
|
|
191 |
[MMLU](https://arxiv.org/abs/2009.03300) is currently one of the most recognized benchmarks for evaluating English comprehension abilities, covering 57 subtasks across different academic fields and difficulty levels. The MMLU 5-shot accuracy performance of Qwen-7B is shown in the following table:
|
192 |
|
193 |
| Model | Avg. | STEM | Social Sciences | Humanities | Others |
|
194 |
+
|:--------------|:------:|:------:|:------:|:------:|:------:|
|
195 |
| LLaMA-7B | 35.1 | 30.5 | 38.3 | 34.0 | 38.1 |
|
196 |
| Baichuan-7B | 42.3 | 35.6 | 48.9 | 38.4 | 48.1 |
|
197 |
| LLaMA2-7B | 45.3 | 36.4 | 51.2 | 42.9 | 52.2 |
|
|
|
214 |
We compared the code capabilities of pre-trained models on [HumanEval](https://github.com/openai/human-eval), and the results are as follows:
|
215 |
|
216 |
| Model | Pass@1 |
|
217 |
+
|:--------------|:------:|
|
218 |
| Baichuan-7B | 9.2 |
|
219 |
| ChatGLM2-6B | 9.2 |
|
220 |
| InternLM-7B | 10.4 |
|
|
|
233 |
We compared the math capabilities of pre-trained models on [GSM8K](https://github.com/openai/grade-school-math) (8-shot), and the results are as follows:
|
234 |
|
235 |
| Model | Acc. |
|
236 |
+
|:--------------|:------:|
|
237 |
| MPT-7B | 6.8 |
|
238 |
| Falcon-7B | 6.8 |
|
239 |
| Baichuan-7B | 9.7 |
|
|
|
254 |
We compared the translation capabilities of pre-trained models on [WMT22](https://www.statmt.org/wmt22/translation-task.html) zh-en and en-zh (5-shot BLEU), and the results are as follows:
|
255 |
|
256 |
| Model | Avg. | zh-en | en-zh |
|
257 |
+
|:------------|:--------:|:--------:|:--------:|
|
258 |
| InternLM-7B | 11.8 | 9.0 | 14.5 |
|
259 |
| LLaMA-7B | 12.7 | 16.7 | 8.7 |
|
260 |
| LLaMA-13B | 15.8 | 19.5 | 12.0 |
|
|
|
329 |
With this method, it is available to load Qwen-7B in `NF4` and `Int8`, which saves you memory usage. We provide related statistics of model performance below. We find that the quantization downgrades the effectiveness slightly but significantly increases inference efficiency and reduces memory costs.
|
330 |
|
331 |
| Precision | MMLU | Memory |
|
332 |
+
| :--------- | :-------: | :-----: |
|
333 |
| BF16 | 56.7 | 16.2G |
|
334 |
| Int8 | 52.8 | 10.1G |
|
335 |
| NF4 | 48.9 | 7.4G |
|