yaojialzc
/

Gigi-Llama3-8B-Chinese-zh

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Gigi-Llama3-8B-Chinese-zh / README.md

yaojialzc's picture

Update README.md

1249c75 verified 8 months ago

|

2.9 kB

	---
	license: apache-2.0
	language:
	- en
	- zh
	datasets:
	- teknium/OpenHermes-2.5
	pipeline_tag: text-generation
	tags:
	- llama
	- latest
	library_name: transformers
	---

	![image/webp](https://cdn-uploads.huggingface.co/production/uploads/64ef2a96f2b8f40224d7b407/C7hdFdUqx88oRu_IpcCZi.webp)

	Gigi 是使用最先进的 Llama-3-8B-Instruct 在超过130万条经过筛选的高质量中英双语语料上进行精调，它能更好地处理各种下游任务，并为您提供高质量的中英双语结果。我们在训练中加入了包含Hermes、glaive-function-calling等高质量的指令精调数据，以及大量使用GPT3.5翻译的GPT4数据，Gigi能很好的在中英双语上满足您的需求。

	# Gigi-Llama-3-8B-zh

	Gigi-Llama-3-8B-zh 是 Gigi 系列的第一个模型，在Hermes、glaive-function-calling、refgpt_fact_v2数据集以及一部分使用GPT3.5翻译成的中文数据上训练，同时改进了模型在中英文上的行为，还加入了COIG-CQIA、alpaca-gpt4-data-zh等中文数据集进一步增强中文能力。

	# How to use

	Gigi-Llama-3-8B-zh 遵循 Llama-3-8B-Instruct 的对话模板，pad token 使用 `<\|end_of_text\|>`。

	```
	<\|begin_of_text\|><\|start_header_id\|>system<\|end_header_id\|>

	{{ system_prompt }}<\|eot_id\|><\|start_header_id\|>user<\|end_header_id\|>

	{{ user_msg_1 }}<\|eot_id\|><\|start_header_id\|>assistant<\|end_header_id\|>

	{{ model_answer_1 }}<\|eot_id\|>
	```

	您可以使用下面代码加载模型推理，对于更高效的推理建议使用vLLM，我们随后会介绍模型的具体性能，并很快更新更大参数和性能更好的精调版本。

	```python
	import torch
	from transformers import PreTrainedTokenizerFast, AutoModelForCausalLM
	from peft import PeftModel
	from torch.nn.functional import softmax
	device = "cuda"

	model_id = "yaojialzc/Gigi-Llama-3-8B-zh"
	tokenizer = PreTrainedTokenizerFast.from_pretrained(model_path)
	model = AutoModelForCausalLM.from_pretrained(
	model_path,
	device_map="auto",
	torch_dtype=torch.bfloat16)

	messages = [
	{"role": "system", "content": "你是一个AI助手。"},
	{"role": "user", "content": "明朝最后一位皇帝是谁？回答他的名字，然后停止输出"},
	]
	prompt = tokenizer.apply_chat_template(
	messages,
	tokenize=False,
	add_generation_prompt=True
	)
	input_ids = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt").to(device)

	output = model.generate(
	input_ids,
	do_sample=True,
	temperature=0.01,
	top_k=50,
	top_p=0.7,
	repetition_penalty=1,
	max_length=128,
	pad_token_id=tokenizer.eos_token_id,
	)
	output = tokenizer.decode(output[0], skip_special_tokens=False)
	print(output)
	```

	llama 3 模型输出 eot 时不会停止，无法开箱即用。我们暂时尊重官方的行为，精调时指导模型在最后直接输出 end_of_text，方便目前开箱即用地在下游领域精调。