j5ng commited on
Commit
2a4518b
โ€ข
1 Parent(s): 7ec6a4b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +56 -0
README.md CHANGED
@@ -1,3 +1,59 @@
1
  ---
2
  license: apache-2.0
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ language:
4
+ - ko
5
+ pipeline_tag: text-generation
6
  ---
7
+
8
+ ### How to use GPTQ model
9
+ https://github.com/jongmin-oh/korean-LLM-quantize
10
+ ```
11
+ mkdir ./templates && mkdir ./utils && wget -P ./templates https://raw.githubusercontent.com/jongmin-oh/korean-LLM-quantize/main/templates/kullm.json && wget -P ./utils https://raw.githubusercontent.com/jongmin-oh/korean-LLM-quantize/main/utils/prompter.py
12
+ ```
13
+
14
+ - ๊ธ‰ํ•˜์‹ ๋ถ„๋“ค์€ ๋ฐ‘์— ์˜ˆ์ œ์ฝ”๋“œ ์‹คํ–‰ํ•˜์‹œ๋ฉด ๋ฐ”๋กœ ํ…Œ์ŠคํŠธ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค. (GPU memory 19GB ์ ์œ )
15
+ - 2023-08-23์ผ ์ดํ›„๋ถ€ํ„ฐ๋Š” huggingFace์—์„œ GPTQ๋ฅผ ๊ณต์‹์ง€์›ํ•˜๊ฒŒ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
16
+
17
+ ```python
18
+ import torch
19
+ from transformers import pipeline
20
+ from auto_gptq import AutoGPTQForCausalLM
21
+
22
+ from utils.prompter import Prompter
23
+
24
+ MODEL = "j5ng/kullm-12.8b-GPTQ-8bit"
25
+ model = AutoGPTQForCausalLM.from_quantized(MODEL, device="cuda:0", use_triton=False)
26
+
27
+ pipe = pipeline('text-generation', model=model,tokenizer=MODEL)
28
+
29
+ prompter = Prompter("kullm")
30
+
31
+ def infer(instruction="", input_text=""):
32
+ prompt = prompter.generate_prompt(instruction, input_text)
33
+ output = pipe(
34
+ prompt, max_length=512,
35
+ temperature=0.2,
36
+ repetition_penalty=3.0,
37
+ num_beams=5,
38
+ eos_token_id=2
39
+ )
40
+ s = output[0]["generated_text"]
41
+ result = prompter.get_response(s)
42
+
43
+ return result
44
+
45
+ instruction = """
46
+ ์†ํฅ๋ฏผ(ํ•œ๊ตญ ํ•œ์ž: ๅญซ่ˆˆๆ…œ, 1992๋…„ 7์›” 8์ผ ~ )์€ ๋Œ€ํ•œ๋ฏผ๊ตญ์˜ ์ถ•๊ตฌ ์„ ์ˆ˜๋กœ ํ˜„์žฌ ์ž‰๊ธ€๋žœ๋“œ ํ”„๋ฆฌ๋ฏธ์–ด๋ฆฌ๊ทธ ํ† ํŠธ๋„˜ ํ™‹์Šคํผ์—์„œ ์œ™์–ด๋กœ ํ™œ์•ฝํ•˜๊ณ  ์žˆ๋‹ค.
47
+ ๋˜ํ•œ ๋Œ€ํ•œ๋ฏผ๊ตญ ์ถ•๊ตฌ ๊ตญ๊ฐ€๋Œ€ํ‘œํŒ€์˜ ์ฃผ์žฅ์ด์ž 2018๋…„ ์•„์‹œ์•ˆ ๊ฒŒ์ž„ ๊ธˆ๋ฉ”๋‹ฌ๋ฆฌ์ŠคํŠธ์ด๋ฉฐ ์˜๊ตญ์—์„œ๋Š” ์• ์นญ์ธ "์˜๋‹ˆ"(Sonny)๋กœ ๋ถˆ๋ฆฐ๋‹ค.
48
+ ์•„์‹œ์•„ ์„ ์ˆ˜๋กœ์„œ๋Š” ์—ญ๋Œ€ ์ตœ์ดˆ๋กœ ํ”„๋ฆฌ๋ฏธ์–ด๋ฆฌ๊ทธ ๊ณต์‹ ๋ฒ ์ŠคํŠธ ์ผ๋ ˆ๋ธ๊ณผ ์•„์‹œ์•„ ์„ ์ˆ˜ ์ตœ์ดˆ์˜ ํ”„๋ฆฌ๋ฏธ์–ด๋ฆฌ๊ทธ ๋“์ ์™•์€ ๋ฌผ๋ก  FIFA ํ‘ธ์Šค์นด์Šค์ƒ๊นŒ์ง€ ํœฉ์“ธ์—ˆ๊ณ  2022๋…„์—๋Š” ์ถ•๊ตฌ ์„ ์ˆ˜๋กœ๋Š” ์ตœ์ดˆ๋กœ ์ฒด์œกํ›ˆ์žฅ ์ฒญ๋ฃก์žฅ ์ˆ˜ํ›ˆ์ž๊ฐ€ ๋˜์—ˆ๋‹ค.
49
+ ์†ํฅ๋ฏผ์€ ํ˜„์žฌ ๋ฆฌ๊ทธ 100ํ˜ธ๋ฅผ ๋„ฃ์–ด์„œ ํ™”์ œ๊ฐ€ ๋˜๊ณ  ์žˆ๋‹ค.
50
+ """
51
+ result = infer(instruction=instruction, input_text="์†ํฅ๋ฏผ์˜ ์• ์นญ์€ ๋ญ์•ผ?")
52
+ print(result) # ์†ํฅ๋ฏผ์˜ ์• ์นญ์€ "์˜๋‹ˆ"์ž…๋‹ˆ๋‹ค.
53
+ ```
54
+
55
+ ### Reference
56
+
57
+ - [EleutherAI/polyglot](https://huggingface.co/EleutherAI/polyglot-ko-12.8b)
58
+ - [๊ณ ๋ ค๋Œ€ํ•™๊ต/kullm](https://huggingface.co/nlpai-lab/kullm-polyglot-12.8b-v2)
59
+ - [GPTQ](https://github.com/IST-DASLab/gptq)