Qwen
/

Text Generation
Transformers
Safetensors
Chinese
English
qwen
custom_code
yangapku commited on
Commit
b0d5375
1 Parent(s): 7e110da

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -8
README.md CHANGED
@@ -95,7 +95,7 @@ Qwen-7B模型规模基本情况如下所示:
95
  The details of the model architecture of Qwen-7B are listed as follows:
96
 
97
  | Hyperparameter | Value |
98
- |:---------------:|-------:|
99
  | n_layers | 32 |
100
  | n_heads | 32 |
101
  | d_model | 4096 |
@@ -146,7 +146,7 @@ For pre-training data, on the one hand, Qwen-7B uses part of the open-source gen
146
  The accuracy comparison of Qwen-7B and the other models on the C-Eval validation set is shown as follows:
147
 
148
  | Model | Avg. |
149
- |:---------------:|---------:|
150
  | Alpaca-7B | 28.9 |
151
  | Vicuna-7B | 31.2 |
152
  | ChatGLM-6B | 37.1 |
@@ -162,7 +162,7 @@ The accuracy comparison of Qwen-7B and the other models on the C-Eval validation
162
  The performance comparison of Qwen-7B and other models on the C-Eval test set is shown in the following table:
163
 
164
  | Model | Avg. | Avg. (Hard) | STEM | Social Sciences | Humanities | Others |
165
- |:--------------:|------:|------:|------:|------:|------:|------:|
166
  | ChatGLM-6B | 38.9 | 29.2 | 33.3 | 48.3 | 41.3 | 38.0 |
167
  | Chinese-Alpaca-Plus-13B | 41.5 | 30.5 | 36.6 | 49.7 | 43.1 | 41.2 |
168
  | Baichuan-7B | 42.8 | 31.5 | 38.2 | 52.0 | 46.2 | 39.3 |
@@ -191,7 +191,7 @@ Qwen-7B在MMLU 5-shot准确率表现如下表:
191
  [MMLU](https://arxiv.org/abs/2009.03300) is currently one of the most recognized benchmarks for evaluating English comprehension abilities, covering 57 subtasks across different academic fields and difficulty levels. The MMLU 5-shot accuracy performance of Qwen-7B is shown in the following table:
192
 
193
  | Model | Avg. | STEM | Social Sciences | Humanities | Others |
194
- |:--------------:|------:|------:|------:|------:|------:|
195
  | LLaMA-7B | 35.1 | 30.5 | 38.3 | 34.0 | 38.1 |
196
  | Baichuan-7B | 42.3 | 35.6 | 48.9 | 38.4 | 48.1 |
197
  | LLaMA2-7B | 45.3 | 36.4 | 51.2 | 42.9 | 52.2 |
@@ -214,7 +214,7 @@ In terms of English, Qwen-7B also surpasses other similar open-source pre-traine
214
  We compared the code capabilities of pre-trained models on [HumanEval](https://github.com/openai/human-eval), and the results are as follows:
215
 
216
  | Model | Pass@1 |
217
- |:--------------:|------:|
218
  | Baichuan-7B | 9.2 |
219
  | ChatGLM2-6B | 9.2 |
220
  | InternLM-7B | 10.4 |
@@ -233,7 +233,7 @@ We compared the code capabilities of pre-trained models on [HumanEval](https://g
233
  We compared the math capabilities of pre-trained models on [GSM8K](https://github.com/openai/grade-school-math) (8-shot), and the results are as follows:
234
 
235
  | Model | Acc. |
236
- |:--------------:|------:|
237
  | MPT-7B | 6.8 |
238
  | Falcon-7B | 6.8 |
239
  | Baichuan-7B | 9.7 |
@@ -254,7 +254,7 @@ We compared the math capabilities of pre-trained models on [GSM8K](https://githu
254
  We compared the translation capabilities of pre-trained models on [WMT22](https://www.statmt.org/wmt22/translation-task.html) zh-en and en-zh (5-shot BLEU), and the results are as follows:
255
 
256
  | Model | Avg. | zh-en | en-zh |
257
- |:-----------:|---------:|---------:|---------:|
258
  | InternLM-7B | 11.8 | 9.0 | 14.5 |
259
  | LLaMA-7B | 12.7 | 16.7 | 8.7 |
260
  | LLaMA-13B | 15.8 | 19.5 | 12.0 |
@@ -329,7 +329,7 @@ model = AutoModelForCausalLM.from_pretrained(
329
  With this method, it is available to load Qwen-7B in `NF4` and `Int8`, which saves you memory usage. We provide related statistics of model performance below. We find that the quantization downgrades the effectiveness slightly but significantly increases inference efficiency and reduces memory costs.
330
 
331
  | Precision | MMLU | Memory |
332
- | :---------: | -------: | -----: |
333
  | BF16 | 56.7 | 16.2G |
334
  | Int8 | 52.8 | 10.1G |
335
  | NF4 | 48.9 | 7.4G |
 
95
  The details of the model architecture of Qwen-7B are listed as follows:
96
 
97
  | Hyperparameter | Value |
98
+ |:----------------|:-------|
99
  | n_layers | 32 |
100
  | n_heads | 32 |
101
  | d_model | 4096 |
 
146
  The accuracy comparison of Qwen-7B and the other models on the C-Eval validation set is shown as follows:
147
 
148
  | Model | Avg. |
149
+ |:----------------|:--------:|
150
  | Alpaca-7B | 28.9 |
151
  | Vicuna-7B | 31.2 |
152
  | ChatGLM-6B | 37.1 |
 
162
  The performance comparison of Qwen-7B and other models on the C-Eval test set is shown in the following table:
163
 
164
  | Model | Avg. | Avg. (Hard) | STEM | Social Sciences | Humanities | Others |
165
+ |:--------------|:------:|:------:|:------:|:------:|:------:|:------:|
166
  | ChatGLM-6B | 38.9 | 29.2 | 33.3 | 48.3 | 41.3 | 38.0 |
167
  | Chinese-Alpaca-Plus-13B | 41.5 | 30.5 | 36.6 | 49.7 | 43.1 | 41.2 |
168
  | Baichuan-7B | 42.8 | 31.5 | 38.2 | 52.0 | 46.2 | 39.3 |
 
191
  [MMLU](https://arxiv.org/abs/2009.03300) is currently one of the most recognized benchmarks for evaluating English comprehension abilities, covering 57 subtasks across different academic fields and difficulty levels. The MMLU 5-shot accuracy performance of Qwen-7B is shown in the following table:
192
 
193
  | Model | Avg. | STEM | Social Sciences | Humanities | Others |
194
+ |:--------------|:------:|:------:|:------:|:------:|:------:|
195
  | LLaMA-7B | 35.1 | 30.5 | 38.3 | 34.0 | 38.1 |
196
  | Baichuan-7B | 42.3 | 35.6 | 48.9 | 38.4 | 48.1 |
197
  | LLaMA2-7B | 45.3 | 36.4 | 51.2 | 42.9 | 52.2 |
 
214
  We compared the code capabilities of pre-trained models on [HumanEval](https://github.com/openai/human-eval), and the results are as follows:
215
 
216
  | Model | Pass@1 |
217
+ |:--------------|:------:|
218
  | Baichuan-7B | 9.2 |
219
  | ChatGLM2-6B | 9.2 |
220
  | InternLM-7B | 10.4 |
 
233
  We compared the math capabilities of pre-trained models on [GSM8K](https://github.com/openai/grade-school-math) (8-shot), and the results are as follows:
234
 
235
  | Model | Acc. |
236
+ |:--------------|:------:|
237
  | MPT-7B | 6.8 |
238
  | Falcon-7B | 6.8 |
239
  | Baichuan-7B | 9.7 |
 
254
  We compared the translation capabilities of pre-trained models on [WMT22](https://www.statmt.org/wmt22/translation-task.html) zh-en and en-zh (5-shot BLEU), and the results are as follows:
255
 
256
  | Model | Avg. | zh-en | en-zh |
257
+ |:------------|:--------:|:--------:|:--------:|
258
  | InternLM-7B | 11.8 | 9.0 | 14.5 |
259
  | LLaMA-7B | 12.7 | 16.7 | 8.7 |
260
  | LLaMA-13B | 15.8 | 19.5 | 12.0 |
 
329
  With this method, it is available to load Qwen-7B in `NF4` and `Int8`, which saves you memory usage. We provide related statistics of model performance below. We find that the quantization downgrades the effectiveness slightly but significantly increases inference efficiency and reduces memory costs.
330
 
331
  | Precision | MMLU | Memory |
332
+ | :--------- | :-------: | :-----: |
333
  | BF16 | 56.7 | 16.2G |
334
  | Int8 | 52.8 | 10.1G |
335
  | NF4 | 48.9 | 7.4G |