jojo0217 commited on
Commit
16c6a3a
β€’
1 Parent(s): f12c0b4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +45 -1
README.md CHANGED
@@ -4,4 +4,48 @@ datasets:
4
  - jojo0217/korean_rlhf_dataset
5
  language:
6
  - ko
7
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  - jojo0217/korean_rlhf_dataset
5
  language:
6
  - ko
7
+ ---
8
+
9
+ μ„±κ· κ΄€λŒ€ν•™κ΅ μ‚°ν•™ν˜‘λ ₯ κ³Όμ •μ—μ„œ λ§Œλ“  ν…ŒμŠ€νŠΈ λͺ¨λΈμž…λ‹ˆλ‹€.
10
+ ν•™μŠ΅ λ°μ΄ν„°μ˜ μ°Έκ³  λͺ¨λΈμ΄λΌκ³  μƒκ°ν•˜μ‹œλ©΄ 쒋을 것 κ°™μŠ΅λ‹ˆλ‹€.
11
+ κΈ°μ‘΄ 10만 7천개의 데이터 + 2천개 μΌμƒλŒ€ν™” μΆ”κ°€ 데이터λ₯Ό μ²¨κ°€ν•˜μ—¬ ν•™μŠ΅ν•˜μ˜€μŠ΅λ‹ˆλ‹€.
12
+
13
+ μΈ‘μ •ν•œ kobest μ μˆ˜λŠ” λ‹€μŒκ³Ό κ°™μŠ΅λ‹ˆλ‹€.
14
+ ![score](./asset/score.png)
15
+
16
+
17
+ ν…ŒμŠ€νŠΈ μ½”λ“œλŠ” λ‹€μŒκ³Ό κ°™μŠ΅λ‹ˆλ‹€.
18
+ ```
19
+ from transformers import pipeline, AutoModelForCausalLM, AutoTokenizer
20
+
21
+ model_name="jojo0217/ChatSKKU5.8B"
22
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
23
+ model = AutoModelForCausalLM.from_pretrained(
24
+ model_name,
25
+ device_map="auto",
26
+ load_in_8bit=True,#λ§Œμ•½ μ–‘μžν™” 끄고 μ‹Άλ‹€λ©΄ false
27
+ )
28
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
29
+ pipe = pipeline(
30
+ "text-generation",
31
+ model=model,
32
+ tokenizer=model_name,
33
+ device_map="auto"
34
+ )
35
+
36
+ def answer(message):
37
+ prompt=f"μ•„λž˜λŠ” μž‘μ—…μ„ μ„€λͺ…ν•˜λŠ” λͺ…λ Ήμ–΄μž…λ‹ˆλ‹€. μš”μ²­μ„ 적절히 μ™„λ£Œν•˜λŠ” 응닡을 μž‘μ„±ν•˜μ„Έμš”.\n\n### λͺ…λ Ήμ–΄:\n{message}"
38
+ ans = pipe(
39
+ prompt + "\n\n### 응닡:",
40
+ do_sample=True,
41
+ max_new_tokens=512,
42
+ temperature=0.9,
43
+ num_beams = 1,
44
+ repetition_penalty = 1.0,
45
+ return_full_text=False,
46
+ eos_token_id=2,
47
+ )
48
+ msg = ans[0]["generated_text"]
49
+ return msg
50
+ answer('μ„±κ· κ΄€λŒ€ν•™κ΅μ—λŒ€ν•΄ μ•Œλ €μ€˜')
51
+ ```