Update README.md
Browse files
README.md
CHANGED
@@ -37,42 +37,41 @@ Gigi-Llama-3-8B-zh 遵循 Llama-3-8B-Instruct 的对话模板,pad token 使用
|
|
37 |
您可以使用下面代码加载模型推理,对于更高效的推理建议使用vLLM,我们随后会介绍模型的具体性能,并很快更新更大参数和性能更好的精调版本。
|
38 |
|
39 |
```python
|
40 |
-
import transformers
|
41 |
import torch
|
|
|
|
|
|
|
|
|
42 |
|
43 |
model_id = "yaojialzc/Gigi-Llama-3-8B-zh"
|
44 |
-
|
45 |
-
|
46 |
-
|
47 |
-
|
48 |
-
|
49 |
-
device="cuda",
|
50 |
-
)
|
51 |
|
52 |
messages = [
|
53 |
-
{"role": "user", "content": "
|
54 |
]
|
55 |
-
|
56 |
-
prompt = pipeline.tokenizer.apply_chat_template(
|
57 |
messages,
|
58 |
tokenize=False,
|
59 |
add_generation_prompt=True
|
60 |
)
|
|
|
61 |
|
62 |
-
|
63 |
-
|
64 |
-
pipeline.tokenizer.convert_tokens_to_ids("<|eot_id|>")
|
65 |
-
]
|
66 |
-
|
67 |
-
outputs = pipeline(
|
68 |
-
prompt,
|
69 |
-
max_new_tokens=256,
|
70 |
-
eos_token_id=terminators,
|
71 |
do_sample=True,
|
72 |
-
temperature=0.
|
73 |
-
|
|
|
|
|
|
|
|
|
74 |
)
|
75 |
-
|
|
|
76 |
```
|
77 |
|
78 |
-
llama 3
|
|
|
37 |
您可以使用下面代码加载模型推理,对于更高效的推理建议使用vLLM,我们随后会介绍模型的具体性能,并很快更新更大参数和性能更好的精调版本。
|
38 |
|
39 |
```python
|
|
|
40 |
import torch
|
41 |
+
from transformers import PreTrainedTokenizerFast, AutoModelForCausalLM
|
42 |
+
from peft import PeftModel
|
43 |
+
from torch.nn.functional import softmax
|
44 |
+
device = "cuda"
|
45 |
|
46 |
model_id = "yaojialzc/Gigi-Llama-3-8B-zh"
|
47 |
+
tokenizer = PreTrainedTokenizerFast.from_pretrained(model_path)
|
48 |
+
model = AutoModelForCausalLM.from_pretrained(
|
49 |
+
model_path,
|
50 |
+
device_map="auto",
|
51 |
+
torch_dtype=torch.bfloat16)
|
|
|
|
|
52 |
|
53 |
messages = [
|
54 |
+
{"role": "user", "content": "明朝最后一位皇帝是谁?回答他的名字,然后停止输出"},
|
55 |
]
|
56 |
+
prompt = tokenizer.apply_chat_template(
|
|
|
57 |
messages,
|
58 |
tokenize=False,
|
59 |
add_generation_prompt=True
|
60 |
)
|
61 |
+
input_ids = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt").to(device)
|
62 |
|
63 |
+
output = model.generate(
|
64 |
+
input_ids,
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
65 |
do_sample=True,
|
66 |
+
temperature=0.01,
|
67 |
+
top_k=50,
|
68 |
+
top_p=0.7,
|
69 |
+
repetition_penalty=1,
|
70 |
+
max_length=128,
|
71 |
+
pad_token_id=tokenizer.eos_token_id,
|
72 |
)
|
73 |
+
output = tokenizer.decode(output[0], skip_special_tokens=False)
|
74 |
+
print(output)
|
75 |
```
|
76 |
|
77 |
+
llama 3 模型输出 eot 时不会停止,无法开箱即用。我们暂时尊重官方的行为,精调时指导模型在最后直接输出 end_of_text,方便目前开箱即用地在下游领域精调。
|