File size: 9,849 Bytes
91be293 c7d77d2 91be293 c7d77d2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 |
---
license: gpl-3.0
language:
- zh
- en
library_name: transformers
---
# Ziya-LLaMA-13B-v1
- Main Page:[Fengshenbang](https://fengshenbang-lm.com/)
- Github: [Fengshenbang-LM](https://github.com/IDEA-CCNL/Fengshenbang-LM)
# 姜子牙系列模型
- [Ziya-LLaMA-13B-v1.1](https://huggingface.co/IDEA-CCNL/Ziya-LLaMA-13B-v1.1)
- [Ziya-LLaMA-13B-v1](https://huggingface.co/IDEA-CCNL/Ziya-LLaMA-13B-v1)
- [Ziya-LLaMA-7B-Reward](https://huggingface.co/IDEA-CCNL/Ziya-LLaMA-7B-Reward)
- [Ziya-LLaMA-13B-Pretrain-v1](https://huggingface.co/IDEA-CCNL/Ziya-LLaMA-13B-Pretrain-v1)
- [Ziya-BLIP2-14B-Visual-v1](https://huggingface.co/IDEA-CCNL/Ziya-BLIP2-14B-Visual-v1)
## 简介 Brief Introduction
姜子牙写作大模型V1是基于LLaMa的130亿参数的指令微调模型,在写作任务上进行了能力增强,是专注于写作的大模型。姜子牙写作模型可以完成公文报告、讲稿书信、创意文案等多类的写作任务。
Ziya-Writing-LLaMa-13B-v1 is a 13-billion parameter instruction fine-tuned model based on LLaMa, which has been enhanced for better performance in writing tasks. It is a large model that focuses on writing. Ziya-Writing-LLaMa-13B-v1 can handle several types of writing tasks, including official reports, speeches, creative copywriting, and more.
更多细节可以参考我们的公众号文章:
## 软件依赖
```
pip install torch==1.12.1 tokenizers==0.13.3 git+https://github.com/huggingface/transformers
```
## 模型分类 Model Taxonomy
| 需求 Demand | 任务 Task | 系列 Series | 模型 Model | 参数 Parameter | 额外 Extra |
| :----: | :----: | :----: | :----: | :----: | :----: |
| 写作 Writing | AGI模型 | 姜子牙 Ziya | LLaMA | 13B | English&Chinese |
## 模型信息 Model Information
### 有监督微调 Supervised finetuning
我们从网络中收集并清洗了大量真实的真人写作数据,利用GPT-3.5生成对应的写作指令,并进行了极为严格的人工校验。
在此基础上,我们利用奖励模型和一定的清洗逻辑,精心挑选了难度更高的写作指令,剔除了简单的数据,并保证了指令的多样性。
我们利用evol-instruct的方法,生成了约30万条高质量的通用指令数据。我们混合了通用指令数据和写作指令数据,这使得ziya-writing不仅拥有良好的意图理解能力,也能够生成优秀的回答。
We collected and cleaned a large amount of real human writing data from the internet and used GPT-3.5 to generate corresponding writing instructions which have undergone extremely strict manual verification.
Based on this, we used a reward model and certain cleaning logic to carefully select more challenging writing instructions, eliminating simple data, and ensuring the diversity of instructions.
We used the evol-instruct method to generate about 300,000 high-quality general instruction data. We mixed general instruction data and writing instruction data, which made ziya-writing not only have good intention understanding ability, but also can generate excellent responses.
### 人类反馈学习 Human-Feedback training
我们在实验中发现,利用少量人类标注的高质量的写作排序数据,使用强化学习训练模型,就能对进一步拔高模型的写作效果。
为了进一步提升模型的表现,使其能够充分理解人类意图、减少“幻觉”和不安全的输出,基于指令微调后的模型,进行了人类反馈训练(Human-Feedback Training,HFT)。在训练中,我们采用了以人类反馈强化学习(RM、PPO)为主。
我们在内部自研的框架上实现了HFT的训练流程,该框架可以利用最少8张40G的A100显卡完成Ziya-Writing-LLaMA-13B-v1的全参数训练。在PPO训练中,我们没有限制生成样本的长度,以确保长文本任务的奖励准确性。每次训练的总经验池尺寸超过100k样本,确保了训练的充分性。
In our experiment, we found that by using a small amount of high-quality human-annotated writing ranking data and training the model with reinforcement learning, we could effectively improve the writing performance of the model.
To further improve the performance of the model, enabling it to fully understand human intentions, reduce "hallucinations" and unsafe outputs, we conducted Human-Feedback Training (HFT) based on the model fine-tuned with instructions. In the training process, we used human feedback reinforcement learning (RM, PPO).
We implemented the HFT training process on an internally developed framework, which can use a minimum of 8 40GB A100 GPUs to complete the full parameter training of Ziya-Writing-LLaMA-13B-v1. In the PPO training, we did not limit the length of the generated samples to ensure the accuracy of rewards for long-text tasks. The total experience pool size for each training exceeded 100k samples, ensuring the sufficiency of the training.
### 效果评估 Performance
写作文案的优劣评价是一个较为主观的评判,很难用一个准确率或者满意度的打分来衡量。因此,我们使用了匿名模型多人Side-by-Side评估的机制,收集了100条不同难度的写作指令数据进行评估,我们后续也会公开这个评测集。
我们以胜出率作为评价模型好坏的指标,一个模型的胜出率计算公式为:
胜出率=(该模型的胜出数量+打平数量/2)/总标注数
一般而言,由于语言模型大多基于采样来生成回答,因此胜出率大于55%表示该模型显著胜出于另外一个模型,胜出率小于45%表示该模型明显落后,胜出率在45%至55%之间表示两个模型基本持平。
The evaluation of the quality of a writing task is quite subjective, making it difficult to measure with precise accuracy or satisfaction score. Therefore, we've used an anonymous multi-person Side-by-Side evaluation mechanism, and have collected 100 pieces of writing instruction data of different difficulties for evaluation. We will also make this evaluation set public in the future.
We use the win rate as an indicator of the quality of a model. The formula to calculate a model's win rate is as follows:
Win Rate = (Number of wins for the model + Number of draws / 2) / Total number of annotations
Generally, since most language models generate responses based on sampling, hence, a win rate greater than 55% indicates that the model significantly outperforms another model, a win rate less than 45% shows that the model clearly lags behind, and a win rate between 45% and 55% signifies that the two models are essentially on par.
| Ziya-Writing-LLaMa-13B-v1 | 平均胜出率 | 最大胜出率 | 最小胜出率 |
| :----: | :----: | :----: | :----: |
| vs Ziya-LLaMa-13B-v1.1 | 70.7 | 73.5 | 69 |
| vs baichuan-vicuna-7b | 69.6 | 73.5 | 68 |
| vs Moss-16B | 65.1 | 69 | 62 |
| vs ChatGLM2-6B | 58.3 | 61.5 | 56 |
| vs Minimax-abab5 | 52.3 | 53 | 50.5 |
| vs GPT-3.5-turbo | 44.7 | 49.5 | 38 |
(注:最大胜出率和最小胜出率,是对每一个标注人员的标注结果进行单独统计,计算出最大和最小的得分;平均胜出率是对所有标注人员的标注结果进行汇总统计,计算出平均的得分。)
## <span id="jump"> 使用 Usage </span>
由于LLaMA权重的许可限制,该模型不能用于商业用途,请严格遵守LLaMA的使用政策。
```python3
from transformers import AutoTokenizer
from transformers import LlamaForCausalLM
import torch
device = torch.device("cuda")
query="帮我写一份去西安的旅游计划"
model = LlamaForCausalLM.from_pretrained("IDEA-CCNL/Ziya-Writing-LLaMa-13B-v1", torch_dtype=torch.float16, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("IDEA-CCNL/Ziya-Writing-LLaMa-13B-v1", use_fast=False)
inputs = '<human>:' + query.strip() + '\n<bot>:'
input_ids = tokenizer(inputs, return_tensors="pt").input_ids.to(device)
generate_ids = model.generate(
input_ids,
max_new_tokens=2048,
do_sample = True,
top_p = 0.85,
temperature = 0.85,
repetition_penalty=1.,
eos_token_id=2,
bos_token_id=1,
pad_token_id=0)
output = tokenizer.batch_decode(generate_ids)[0]
print(output)
## 微调示例 Finetune Example
Refer to [ziya_finetune](https://github.com/IDEA-CCNL/Fengshenbang-LM/tree/main/fengshen/examples/ziya_llama)
## 推理量化示例 Inference & Quantization Example
Refer to [ziya_inference](https://github.com/IDEA-CCNL/Fengshenbang-LM/tree/main/fengshen/examples/ziya_inference)
## 引用 Citation
如果您在您的工作中使用了我们的模型,可以引用我们的[论文](https://arxiv.org/abs/2210.08590):
If you are using the resource for your work, please cite the our [paper](https://arxiv.org/abs/2210.08590):
```text
@article{fengshenbang,
author = {Jiaxing Zhang and Ruyi Gan and Junjie Wang and Yuxiang Zhang and Lin Zhang and Ping Yang and Xinyu Gao and Ziwei Wu and Xiaoqun Dong and Junqing He and Jianheng Zhuo and Qi Yang and Yongfeng Huang and Xiayu Li and Yanghan Wu and Junyu Lu and Xinyu Zhu and Weifeng Chen and Ting Han and Kunhao Pan and Rui Wang and Hao Wang and Xiaojun Wu and Zhongshen Zeng and Chongpei Chen},
title = {Fengshenbang 1.0: Being the Foundation of Chinese Cognitive Intelligence},
journal = {CoRR},
volume = {abs/2209.02970},
year = {2022}
}
```
You can also cite our [website](https://github.com/IDEA-CCNL/Fengshenbang-LM/):
欢迎引用我们的[网站](https://github.com/IDEA-CCNL/Fengshenbang-LM/):
```text
@misc{Fengshenbang-LM,
title={Fengshenbang-LM},
author={IDEA-CCNL},
year={2021},
howpublished={\url{https://github.com/IDEA-CCNL/Fengshenbang-LM}},
}
```
|