komt-Llama-2-13b-hf-lora
This model fine-tuned the aaa model using PEFT-LoRA.
The "komt-Llama-2-13b-hf-lora" model was developed using a multi-task instruction technique aimed at enhancing Korean language performance. For more details, please refer to the GitHub Repository. Please refer below for more detailed information.
For more detailed information, please refer to the https://huggingface.co/davidkim205/komt-Llama-2-13b-hf.
Model Details
- Model Developers : davidkim(changyeon kim)
- Repository : https://github.com/davidkim205/komt
- Lora target modules : q_proj, o_proj, v_proj, gate_proj, down_proj, k_proj, up_proj
- Model Size : 120MB
- Model Architecture : komt-Llama-2-13b is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning by multi-task instruction
Dataset
korean multi-task instruction dataset
Prompt Template
### instruction: {prompt}
### Response:
Examples:
### instruction: ์ํ ํด๋ฆฌํฌํฐ ์๋ฆฌ์ฆ ๋ฐฐ๊ธ์ฌ๊ฐ ์ด๋์ผ
### Response:
response:
### instruction: ์ํ ํด๋ฆฌํฌํฐ ์๋ฆฌ์ฆ ๋ฐฐ๊ธ์ฌ๊ฐ ์ด๋์ผ
### Response: ์๋ ๋ธ๋ผ๋์ค\n๊ทธ ์ธ์๋ ํค๋ค์ค ๊ฐ์ ์์ ์ญํ ์ ๋งก์ ์ฌ๋๋ค๋ ์์๋ค. ๋ฐ์ด๋น๋ ์๋ฐ๋ ์ฆ๋ ๋ณผ๋๋ชจํธ ๊ตฐ๋จ์ ๋ณด๋ก๋ฏธ๋ฅด 7๊ฐ ํ๊ต ์ค ํ๋์ธ ํ๋ฆฝ์ค๋ถ๋ฅด๊ทธ ๋ณด๋ก๋ฏธ๋ฅด 7๊ฐ ํ๊ต์ ๊ต์ฅ์ด๋ฉฐ, ํ๋ฆฌ์ฐ์ค ํ๋ฆฌํธ์ฐ๋๋ ํ๋ฆฝ์ค๋ถ๋ฅด๊ทธ ๋ณด๋ก๋ฏธ๋ฅด 7๊ฐ ํ๊ต์ ๊ต์ฌ์ด๋ค. ํ๋ฆฝ์ค๋ถ๋ฅด๊ทธ๋ ๋ณด๋ก๋ฏธ๋ฅด 7๊ฐ ํ๊ต ์ค ํ๋๋ก, ๋ณด๋ก๋ฏธ๋ฅด ์ ์ค์ ๋ฐ๋ฅด๋ฉด ๋ณด๋ก๋ฏธ๋ฅด 7๊ฐ ํ๊ต์ ๊ต์ฅ๋ค์ด ์ฃฝ์ผ๋ฉด ์ธ๊ณ๋ฅผ ๋ฉธ๋ง์ํจ๋ค๋ ์ ์ ์๊ณ ์๋ค. ใํด๋ฆฌ ํฌํฐ์ ํผํ ์์ใ์์ ๋ฐ์ค ์ดํฐ๋ ํด๋ฆฌ ํฌํฐ๋ฅผ ์ฃฝ์ด๋ ค๊ณ ํ์ง๋ง ์คํจํ๊ณ ํด๋ฆฌ ํฌํฐ๋ ๋ฐ์ค ์ดํฐ๋ฅผ ์ฃฝ์ด๋ ค๊ณ ํ์ง๋ง ๋ ์คํจํ๋ค. ใํด๋ฆฌ ํฌํฐ์ ์์ฆ์นด๋ฐ์ ์ฃ์ใ์์ ๋ฐ์ค ์ดํฐ๋ ์์ฆ์นด๋ฐ์ ์ฃ์๋ก ๋ฑ์ฅํ์ฌ ํด๋ฆฌ ํฌํฐ์๊ฒ ๋ณต์๋ฅผ ํ๊ณ ์ ํ์ง๋ง ํด๋ฆฌ ํฌํฐ์ ๋ก ์์ฆ๋ฆฌ์ ๋ฐ ๋น ๋ฅธ ๋์ฒ๋ก ์คํจํ๊ณ ํด๋ฆฌ ํฌํฐ์ ๋ก ์์ฆ๋ฆฌ๋ ๋ฐ์ค ์ดํฐ๋ฅผ ์ฃฝ์ด๋ ค๊ณ ํ์ง๋ง ์คํจํ๊ณ ๊ทธ ๊ฒฐ๊ณผ ๋ฐ์ค ์ดํฐ๋ ๋ค์ ๊ธฐ์ ์น์ ๋ด์ธ๋๋ค. ใํด๋ฆฌ ํฌํฐ์ ์ฃฝ์์ ์ฑ๋ฌผใ์์ ๋ฐ์ค ์ดํฐ๋ ๋ถํํ์ฌ ์ฃฝ์์ ์ฑ๋ฌผ 1๋ถ์ ๋ฑ์ฅํ์์ผ๋ฉฐ, ํด๋ฆฌ ํฌํฐ์ ์ฃฝ์์ ์ฑ๋ฌผ 2๋ถ์์ ๋ฐ์ค ์ดํฐ๋ ์ฃฝ์์ ์ฑ๋ฌผ 2๋ถ์ ๋ด์ฉ์ ๋ฐฉํดํ๋ ๊ฐ์ฒฉ ์ญํ ์ ํ๋ค. ๋ฐ์ค ์ดํฐ๋ ์ํ ํด๋ฆฌํฌํฐ์ ์ฃฝ์์ ์ฑ๋ฌผ 1๋ถ์์ ๋ฐ์ค ์ดํฐ์ ์ญํ ์ ๋งก์ ๋ฐฐ์ฐ ์คํฐ๋ธ ํผ์นด๋๊ฐ ์ฐ๊ธฐํ๋ค.
Usage
After downloading from GitHub, please install as follows:
git clone https://github.com/davidkim205/komt
cd komt
pip install -r lora/requirements_lora.txt
- Requirements Python >=3.8. Linux distribution (Ubuntu, MacOS, etc.) + CUDA > 10.0. Refer https://github.com/TimDettmers/bitsandbytes#tldr
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from transformers import StoppingCriteria, StoppingCriteriaList
from transformers import TextStreamer, GenerationConfig
from peft import PeftModel, PeftConfig
class LocalStoppingCriteria(StoppingCriteria):
def __init__(self, tokenizer, stop_words = []):
super().__init__()
stops = [tokenizer(stop_word, return_tensors='pt', add_special_tokens = False)['input_ids'].squeeze() for stop_word in stop_words]
print('stop_words', stop_words)
print('stop_words_ids', stops)
self.stop_words = stop_words
self.stops = [stop.cuda() for stop in stops]
self.tokenizer = tokenizer
def _compare_token(self, input_ids):
for stop in self.stops:
if len(stop.size()) != 1:
continue
stop_len = len(stop)
if torch.all((stop == input_ids[0][-stop_len:])).item():
return True
return False
def _compare_decode(self, input_ids):
input_str = self.tokenizer.decode(input_ids[0])
for stop_word in self.stop_words:
if input_str.endswith(stop_word):
return True
return False
def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor):
input_str = self.tokenizer.decode(input_ids[0])
for stop_word in self.stop_words:
if input_str.endswith(stop_word):
return True
return False
#
# config
peft_model_name = 'davidkim205/komt-Llama-2-7b-chat-hf-lora'
model_name = 'davidkim205/komt-Llama-2-7b-chat-hf'
instruction_prefix = "### instruction: "
input_prefix = "### input: "
answer_prefix = "### Response: "
endoftext = "<|end|>"
stop_words = [endoftext, '<s>', '###']
generation_config = GenerationConfig(
temperature=0.9,
top_p=0.7,
top_k=100,
max_new_tokens=2048,
early_stopping=True,
do_sample=True,
)
#
# create model
config = PeftConfig.from_pretrained(peft_model_name)
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)
model = AutoModelForCausalLM.from_pretrained(model_name, quantization_config=bnb_config,
device_map="auto")
model = PeftModel.from_pretrained(model, peft_model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
stopping_criteria = StoppingCriteriaList([LocalStoppingCriteria(tokenizer=tokenizer, stop_words=stop_words)])
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
model.eval()
#
# generate
prompt = f"### instruction: ์ํ ํด๋ฆฌํฌํฐ ์๋ฆฌ์ฆ ๋ฐฐ๊ธ์ฌ๊ฐ ์ด๋์ผ\n\n### Response:"
gened = model.generate(
**tokenizer(
prompt,
return_tensors='pt',
return_token_type_ids=False
).to('cuda'),
generation_config=generation_config,
eos_token_id=model.config.eos_token_id,
stopping_criteria=stopping_criteria,
streamer=streamer
)
output_text = tokenizer.decode(gened[0], skip_special_tokens=True)
print('--------------------')
print(output_text)
response:
์๋ ๋ธ๋ผ๋์ค\n๊ทธ ์ธ์๋ ํค๋ค์ค ๊ฐ์ ์์ ์ญํ ์ ๋งก์ ์ฌ๋๋ค๋ ์์๋ค. ๋ฐ์ด๋น๋ ์๋ฐ๋ ์ฆ๋ ๋ณผ๋๋ชจํธ ๊ตฐ๋จ์ ๋ณด๋ก๋ฏธ๋ฅด 7๊ฐ ํ๊ต ์ค ํ๋์ธ ํ๋ฆฝ์ค๋ถ๋ฅด๊ทธ ๋ณด๋ก๋ฏธ๋ฅด 7๊ฐ ํ๊ต์ ๊ต์ฅ์ด๋ฉฐ, ํ๋ฆฌ์ฐ์ค ํ๋ฆฌํธ์ฐ๋๋ ํ๋ฆฝ์ค๋ถ๋ฅด๊ทธ ๋ณด๋ก๋ฏธ๋ฅด 7๊ฐ ํ๊ต์ ๊ต์ฌ์ด๋ค. ํ๋ฆฝ์ค๋ถ๋ฅด๊ทธ๋ ๋ณด๋ก๋ฏธ๋ฅด 7๊ฐ ํ๊ต ์ค ํ๋๋ก, ๋ณด๋ก๋ฏธ๋ฅด ์ ์ค์ ๋ฐ๋ฅด๋ฉด ๋ณด๋ก๋ฏธ๋ฅด 7๊ฐ ํ๊ต์ ๊ต์ฅ๋ค์ด ์ฃฝ์ผ๋ฉด ์ธ๊ณ๋ฅผ ๋ฉธ๋ง์ํจ๋ค๋ ์ ์ ์๊ณ ์๋ค. ใํด๋ฆฌ ํฌํฐ์ ํผํ ์์ใ์์ ๋ฐ์ค ์ดํฐ๋ ํด๋ฆฌ ํฌํฐ๋ฅผ ์ฃฝ์ด๋ ค๊ณ ํ์ง๋ง ์คํจํ๊ณ ํด๋ฆฌ ํฌํฐ๋ ๋ฐ์ค ์ดํฐ๋ฅผ ์ฃฝ์ด๋ ค๊ณ ํ์ง๋ง ๋ ์คํจํ๋ค. ใํด๋ฆฌ ํฌํฐ์ ์์ฆ์นด๋ฐ์ ์ฃ์ใ์์ ๋ฐ์ค ์ดํฐ๋ ์์ฆ์นด๋ฐ์ ์ฃ์๋ก ๋ฑ์ฅํ์ฌ ํด๋ฆฌ ํฌํฐ์๊ฒ ๋ณต์๋ฅผ ํ๊ณ ์ ํ์ง๋ง ํด๋ฆฌ ํฌํฐ์ ๋ก ์์ฆ๋ฆฌ์ ๋ฐ ๋น ๋ฅธ ๋์ฒ๋ก ์คํจํ๊ณ ํด๋ฆฌ ํฌํฐ์ ๋ก ์์ฆ๋ฆฌ๋ ๋ฐ์ค ์ดํฐ๋ฅผ ์ฃฝ์ด๋ ค๊ณ ํ์ง๋ง ์คํจํ๊ณ ๊ทธ ๊ฒฐ๊ณผ ๋ฐ์ค ์ดํฐ๋ ๋ค์ ๊ธฐ์ ์น์ ๋ด์ธ๋๋ค. ใํด๋ฆฌ ํฌํฐ์ ์ฃฝ์์ ์ฑ๋ฌผใ์์ ๋ฐ์ค ์ดํฐ๋ ๋ถํํ์ฌ ์ฃฝ์์ ์ฑ๋ฌผ 1๋ถ์ ๋ฑ์ฅํ์์ผ๋ฉฐ, ํด๋ฆฌ ํฌํฐ์ ์ฃฝ์์ ์ฑ๋ฌผ 2๋ถ์์ ๋ฐ์ค ์ดํฐ๋ ์ฃฝ์์ ์ฑ๋ฌผ 2๋ถ์ ๋ด์ฉ์ ๋ฐฉํดํ๋ ๊ฐ์ฒฉ ์ญํ ์ ํ๋ค. ๋ฐ์ค ์ดํฐ๋ ์ํ ํด๋ฆฌํฌํฐ์ ์ฃฝ์์ ์ฑ๋ฌผ 1๋ถ์์ ๋ฐ์ค ์ดํฐ์ ์ญํ ์ ๋งก์ ๋ฐฐ์ฐ ์คํฐ๋ธ ํผ์นด๋๊ฐ ์ฐ๊ธฐํ๋ค.
## Hardware and Software
- nvidia driver : 535.54.03
- CUDA Version: 12.2
Training procedure
The following bitsandbytes
quantization config was used during training:
- load_in_8bit: False
- load_in_4bit: True
- llm_int8_threshold: 6.0
- llm_int8_skip_modules: None
- llm_int8_enable_fp32_cpu_offload: False
- llm_int8_has_fp16_weight: False
- bnb_4bit_quant_type: nf4
- bnb_4bit_use_double_quant: True
- bnb_4bit_compute_dtype: bfloat16
Framework versions
- PEFT 0.5.0.dev0
- Downloads last month
- 2