---
license: cc-by-sa-4.0
language:
- en
tags:
- text-generation-inference
pipeline_tag: text-generation
---


## Original model card 

Buy me a coffee if you like this project ;)
<a href="https://www.buymeacoffee.com/s3nh"><img src="https://www.buymeacoffee.com/assets/img/guidelines/download-assets-sm-1.svg" alt=""></a>

#### Description 

GPTQ version, compressed, quantized. [This project](https://huggingface.co/AlpachinoNLP/Baichuan-13B-Instruction/).


### inference 


# Original model card


## 使用方式

如下是一个使用Baichuan-13B-Chat进行对话的示例，正确输出为"乔戈里峰。世界第二高峰———乔戈里峰西方登山者称其为k2峰，海拔高度是8611米，位于喀喇昆仑山脉的中巴边境上"
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.generation.utils import GenerationConfig
tokenizer = AutoTokenizer.from_pretrained("AlpachinoNLP/Baichuan-13B-Instruction", use_fast=False, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("AlpachinoNLP/Baichuan-13B-Instruction", device_map="auto", torch_dtype=torch.float16, trust_remote_code=True)
model.generation_config = GenerationConfig.from_pretrained("AlpachinoNLP/Baichuan-13B-Instruction")
messages = []
messages.append({"role": "Human", "content": "世界上第二高的山峰是哪座"})
response = model.chat(tokenizer, messages)
print(response)
```

## 量化部署

Baichuan-13B 支持 int8 和 int4 量化，用户只需在推理代码中简单修改两行即可实现。请注意，如果是为了节省显存而进行量化，应加载原始精度模型到 CPU 后再开始量化；避免在 `from_pretrained` 时添加 `device_map='auto'` 或者其它会导致把原始精度模型直接加载到 GPU 的行为的参数。

使用 int8 量化 (To use int8 quantization):
```python
model = AutoModelForCausalLM.from_pretrained("AlpachinoNLP/Baichuan-13B-Instruction", torch_dtype=torch.float16, trust_remote_code=True)
model = model.quantize(8).cuda() 
```

同样的，如需使用 int4 量化 (Similarly, to use int4 quantization):
```python
model = AutoModelForCausalLM.from_pretrained("AlpachinoNLP/Baichuan-13B-Instruction", torch_dtype=torch.float16, trust_remote_code=True)
model = model.quantize(4).cuda()
```

## 模型详情


### 模型结构

<!-- Provide the basic links for the model. -->

整体模型基于Baichuan-13B，为了获得更好的推理性能，Baichuan-13B 使用了 ALiBi 线性偏置技术，相对于 Rotary Embedding 计算量更小，对推理性能有显著提升；与标准的 LLaMA-13B 相比，生成 2000 个 tokens 的平均推理速度 (tokens/s)，实测提升 31.6%：

| Model        | tokens/s |
| ------------ | -------- |
| LLaMA-13B    | 19.4     |
| Baichuan-13B | 25.4     |

具体参数和见下表
| 模型名称     | 隐含层维度 | 层数 | 头数 | 词表大小 | 总参数量       | 训练数据（tokens） | 位置编码                                  | 最大长度 |
| ------------ | ---------- | ---- | ---- | -------- | -------------- | ------------------ | ----------------------------------------- | -------- |
| Baichuan-7B  | 4,096      | 32   | 32   | 64,000   | 7,000,559,616  | 1.2万亿            | [RoPE](https://arxiv.org/abs/2104.09864)  | 4,096    |
| Baichuan-13B | 5,120      | 40   | 40   | 64,000   | 13,264,901,120 | 1.4万亿            | [ALiBi](https://arxiv.org/abs/2108.12409) | 4,096    |

## 训练详情

数据集主要由三部分组成：

* 在 [sharegpt_zh](https://huggingface.co/datasets/QingyiSi/Alpaca-CoT/tree/main/ShareGPT) 数据集中筛选的出 13k 高质量数据。
* [lima](https://huggingface.co/datasets/GAIR/lima)
* 按照任务类型挑选的 2.3k 高质量中文数据集，每个任务类型的数据量在 100 条左右。

硬件：8*A40

## 测评结果

## [CMMLU](https://github.com/haonan-li/CMMLU)

| Model 5-shot                                               |   STEM    | Humanities | Social Sciences |  Others  | China Specific | Average  |
| ---------------------------------------------------------- | :-------: | :--------: | :-------------: | :------: | :------------: | :------: |
| Baichuan-7B |   34.4    |    47.5    |      47.6       |   46.6   |      44.3      |   44.0   |
| Vicuna-13B                                                 |   31.8    |    36.2    |      37.6       |   39.5   |      34.3      |   36.3   |
| Chinese-Alpaca-Plus-13B                                    |   29.8    |    33.4    |      33.2       |   37.9   |      32.1      |   33.4   |
| Chinese-LLaMA-Plus-13B                                     |   28.1    |    33.1    |      35.4       |   35.1   |      33.5      |   33.0   |
| Ziya-LLaMA-13B-Pretrain                                    |   29.0    |    30.7    |      33.8       |   34.4   |      31.9      |   32.1   |
| LLaMA-13B                                                  |   29.2    |    30.8    |      31.6       |   33.0   |      30.5      |   31.2   |
| moss-moon-003-base (16B)                                   |   27.2    |    30.4    |      28.8       |   32.6   |      28.7      |   29.6   |
| Baichuan-13B-Base                                          |   41.7    |    61.1    |      59.8       |   59.0   |      56.4      |   55.3   |
| Baichuan-13B-Chat                                          |   42.8    |  **62.6**  |    **59.7**     | **59.0** |    **56.1**    | **55.8** |
| **Baichuan-13B-Instruction**                               | **44.50** |   61.16    |      59.07      |  58.34   |     55.55      |  55.61   |

| Model zero-shot                                              |   STEM    | Humanities | Social Sciences |  Others   | China Specific |  Average  |
| ------------------------------------------------------------ | :-------: | :--------: | :-------------: | :-------: | :------------: | :-------: |
| [ChatGLM2-6B](https://huggingface.co/THUDM/chatglm2-6b)      |   41.28   |   52.85    |      53.37      |   52.24   |     50.58      |   49.95   |
| [Baichuan-7B](https://github.com/baichuan-inc/baichuan-7B)   |   32.79   |   44.43    |      46.78      |   44.79   |     43.11      |   42.33   |
| [ChatGLM-6B](https://github.com/THUDM/GLM-130B)              |   32.22   |   42.91    |      44.81      |   42.60   |     41.93      |   40.79   |
| [BatGPT-15B](https://arxiv.org/abs/2307.00360)               |   33.72   |   36.53    |      38.07      |   46.94   |     38.32      |   38.51   |
| [Chinese-LLaMA-13B](https://github.com/ymcui/Chinese-LLaMA-Alpaca) |   26.76   |   26.57    |      27.42      |   28.33   |     26.73      |   27.34   |
| [MOSS-SFT-16B](https://github.com/OpenLMLab/MOSS)            |   25.68   |   26.35    |      27.21      |   27.92   |     26.70      |   26.88   |
| [Chinese-GLM-10B](https://github.com/THUDM/GLM)              |   25.57   |   25.01    |      26.33      |   25.94   |     25.81      |   25.80   |
| [Baichuan-13B](https://github.com/baichuan-inc/Baichuan-13B) |   42.04   |   60.49    |      59.55      |   56.60   |     55.72      |   54.63   |
| [Baichuan-13B-Chat](https://github.com/baichuan-inc/Baichuan-13B) |   37.32   |   56.24    |      54.79      |   54.07   |     52.23      |   50.48   |
| **Baichuan-13B-Instruction**                                 | **42.56** | **62.09**  |    **60.41**    | **58.97** |   **56.95**    | **55.88** |

> 说明：CMMLU 是一个综合性的中文评估基准，专门用于评估语言模型在中文语境下的知识和推理能力。我们直接使用其官方的[评测脚本](https://github.com/haonan-li/CMMLU)对模型进行评测。Model zero-shot 表格中 [Baichuan-13B-Chat](https://github.com/baichuan-inc/Baichuan-13B) 的得分来自我们直接运行 CMMLU 官方的评测脚本得到，其他模型的的得分来自于 [CMMLU](https://github.com/haonan-li/CMMLU/tree/master) 官方的评测结果，Model 5-shot 中其他模型的得分来自于[Baichuan-13B](https://github.com/baichuan-inc/Baichuan-13B) 官方的评测结果。