GEB-AGI
/

geb-1.3b

Feature Extraction

Model card Files Files and versions Community

geb-1.3b / README.md

lorashen's picture

Update README.md

3612b08 verified 7 months ago

|

2.38 kB

	---
	license: other
	---
	# GEB-1.3B
	GEB-1.3B是北京集异璧科技有限公司发布的轻量级语言大模型，拥有13亿参数，由550B中英文tokens数据训练而成。采用了目前较新的训练技术，包括 ROPE位置编码、多组注意力机制和FlashAttention-2，以在加速训练的同时保持模型性能。此外，我们使用了 1000 万条指令数据进行微调，以增强模型的对齐能力，并采用DPO方法更新模型以符合人类偏好。
	GEB-1.3B在MMLU、C-Eval和CMMLU等常用基准测试中表现优异，超过了类似同参数级别的模型如TinyLLaMA-1.1B。值得注意的是，GEB-1.3B的FP32版本在CPU上实现了令人满意的推理时间，我们正在通过先进的量化技术进一步提升速度。

	# 评测结果
	\| Model \| MMLU \| C-Eval \| CMMLU \| Average \|
	\|----------------\|-------\|--------\|-------\|---------\|
	\| Baichuan-7B \| 42.30 \| 42.80 \| 44.02 \| 43.04 \|
	\| ChatGLM-6B \| 40.63 \| 38.90 \| - \| 39.77 \|
	\| GEB-1.3B \| 31.20 \| 33.30 \| 32.20 \| 32.23 \|
	\| Llama-7B \| 35.10 \| 27.10 \| 26.75 \| 29.65 \|
	\| Falcon-7B \| 28.00 \| - \| - \| 28.00 \|
	\| MPT-7B \| 27.93 \| 27.15 \| 26.00 \| 27.03 \|
	\| MindLLM-1.3B \| 26.20 \| 26.10 \| 25.33 \| 25.88 \|
	\| MindLLM-3B \| 26.20 \| 25.70 \| 25.00 \| 25.63 \|
	\| TinyLlama-1.1B \| 25.34 \| 25.02 \| 24.03 \| 24.80 \|

	# 运行模型

	使用 transformers 后端进行推理:

	```python
	from transformers import AutoTokenizer, AutoModel
	import torch
	model = AutoModel.from_pretrained("GEB-AGI/geb-1.3b", trust_remote_code=True).bfloat16().cuda()
	tokenizer = AutoTokenizer.from_pretrained("GEB-AGI/geb-1.3b", trust_remote_code=True)

	query = "你好"
	response, history = model.chat(tokenizer, query, history=[])
	print(response)
	```
	如果无法下载，请手动clone repo把模型文件下载到本地，并将本地路径替换model和tokenizer的路径。

	# 推理速度

	\| 推理硬件 \| 速度token/s \|
	\|:--------:\|:-----------:\|
	\| CPU \| 12 \|
	\| 3090 \| 45 \|
	\| 4090 \| 50 \|


	## 协议

	GEB-1.3B 模型的权重的使用则需要遵循 [LICENSE](LICENSE)。

	## 引用
	```
	@article{geb-1.3b,
	title={GEB-1.3B: Open Lightweight Large Language Model},
	author={Jie Wu and Yufeng Zhu and Lei Shen and Xuqing Lu},
	journal={arXiv preprint arXiv:2406.09900},
	year={2024}
	}
	```