File size: 1,574 Bytes
579cd1a ccfdebe 16c6a3a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 |
---
license: apache-2.0
datasets:
- jojo0217/korean_rlhf_dataset
language:
- ko
---
μ±κ· κ΄λνκ΅ μ°ννλ ₯ κ³Όμ μμ λ§λ ν
μ€νΈ λͺ¨λΈμ
λλ€.
νμ΅ λ°μ΄ν°μ μ°Έκ³ λͺ¨λΈμ΄λΌκ³ μκ°νμλ©΄ μ’μ κ² κ°μ΅λλ€.
κΈ°μ‘΄ 10λ§ 7μ²κ°μ λ°μ΄ν° + 2μ²κ° μΌμλν μΆκ° λ°μ΄ν°λ₯Ό 첨κ°νμ¬ νμ΅νμμ΅λλ€.
μΈ‘μ ν kobest μ μλ λ€μκ³Ό κ°μ΅λλ€.
![score](./asset/score.png)
ν
μ€νΈ μ½λλ λ€μκ³Ό κ°μ΅λλ€.
```
from transformers import pipeline, AutoModelForCausalLM, AutoTokenizer
model_name="jojo0217/ChatSKKU5.8B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
device_map="auto",
load_in_8bit=True,#λ§μ½ μμν λκ³ μΆλ€λ©΄ false
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
pipe = pipeline(
"text-generation",
model=model,
tokenizer=model_name,
device_map="auto"
)
def answer(message):
prompt=f"μλλ μμ
μ μ€λͺ
νλ λͺ
λ Ήμ΄μ
λλ€. μμ²μ μ μ ν μλ£νλ μλ΅μ μμ±νμΈμ.\n\n### λͺ
λ Ήμ΄:\n{message}"
ans = pipe(
prompt + "\n\n### μλ΅:",
do_sample=True,
max_new_tokens=512,
temperature=0.9,
num_beams = 1,
repetition_penalty = 1.0,
return_full_text=False,
eos_token_id=2,
)
msg = ans[0]["generated_text"]
return msg
answer('μ±κ· κ΄λνκ΅μλν΄ μλ €μ€')
```
|