Edit model card

komt-Llama-2-13b-hf-lora

This model fine-tuned the aaa model using PEFT-LoRA.

The "komt-Llama-2-13b-hf-lora" model was developed using a multi-task instruction technique aimed at enhancing Korean language performance. For more details, please refer to the GitHub Repository. Please refer below for more detailed information.

For more detailed information, please refer to the https://huggingface.co/davidkim205/komt-Llama-2-13b-hf.

Model Details

  • Model Developers : davidkim(changyeon kim)
  • Repository : https://github.com/davidkim205/komt
  • Lora target modules : q_proj, o_proj, v_proj, gate_proj, down_proj, k_proj, up_proj
  • Model Size : 120MB
  • Model Architecture : komt-Llama-2-13b is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning by multi-task instruction

Dataset

korean multi-task instruction dataset

Prompt Template

### instruction: {prompt}

### Response: 

Examples:

### instruction: ์˜ํ™” ํ•ด๋ฆฌํฌํ„ฐ ์‹œ๋ฆฌ์ฆˆ ๋ฐฐ๊ธ‰์‚ฌ๊ฐ€ ์–ด๋””์•ผ

### Response:

response:

### instruction: ์˜ํ™” ํ•ด๋ฆฌํฌํ„ฐ ์‹œ๋ฆฌ์ฆˆ ๋ฐฐ๊ธ‰์‚ฌ๊ฐ€ ์–ด๋””์•ผ

### Response: ์›Œ๋„ˆ ๋ธŒ๋ผ๋”์Šค\n๊ทธ ์™ธ์—๋„ ํ‚ค๋„ค์Šค ๊ฐ™์€ ์ž‘์€ ์—ญํ• ์„ ๋งก์€ ์‚ฌ๋žŒ๋“ค๋„ ์žˆ์—ˆ๋‹ค. ๋ฐ์ด๋น„๋“œ ์•Œ๋ฐ”๋ ˆ์ฆˆ๋Š” ๋ณผ๋“œ๋ชจํŠธ ๊ตฐ๋‹จ์˜ ๋ณด๋กœ๋ฏธ๋ฅด 7๊ฐœ ํ•™๊ต ์ค‘ ํ•˜๋‚˜์ธ ํ•„๋ฆฝ์Šค๋ถ€๋ฅด๊ทธ ๋ณด๋กœ๋ฏธ๋ฅด 7๊ฐœ ํ•™๊ต์˜ ๊ต์žฅ์ด๋ฉฐ, ํ•„๋ฆฌ์šฐ์Šค ํ”Œ๋ฆฌํŠธ์šฐ๋“œ๋Š” ํ•„๋ฆฝ์Šค๋ถ€๋ฅด๊ทธ ๋ณด๋กœ๋ฏธ๋ฅด 7๊ฐœ ํ•™๊ต์˜ ๊ต์‚ฌ์ด๋‹ค. ํ•„๋ฆฝ์Šค๋ถ€๋ฅด๊ทธ๋Š” ๋ณด๋กœ๋ฏธ๋ฅด 7๊ฐœ ํ•™๊ต ์ค‘ ํ•˜๋‚˜๋กœ, ๋ณด๋กœ๋ฏธ๋ฅด ์ „์„ค์— ๋”ฐ๋ฅด๋ฉด ๋ณด๋กœ๋ฏธ๋ฅด 7๊ฐœ ํ•™๊ต์˜ ๊ต์žฅ๋“ค์ด ์ฃฝ์œผ๋ฉด ์„ธ๊ณ„๋ฅผ ๋ฉธ๋ง์‹œํ‚จ๋‹ค๋Š” ์ ์„ ์•Œ๊ณ  ์žˆ๋‹ค. ใ€Šํ•ด๋ฆฌ ํฌํ„ฐ์™€ ํ˜ผํ˜ˆ ์™•์žใ€‹์—์„œ ๋ฐ์Šค ์ดํ„ฐ๋Š” ํ•ด๋ฆฌ ํฌํ„ฐ๋ฅผ ์ฃฝ์ด๋ ค๊ณ  ํ•˜์ง€๋งŒ ์‹คํŒจํ•˜๊ณ  ํ•ด๋ฆฌ ํฌํ„ฐ๋Š” ๋ฐ์Šค ์ดํ„ฐ๋ฅผ ์ฃฝ์ด๋ ค๊ณ  ํ•˜์ง€๋งŒ ๋˜ ์‹คํŒจํ•œ๋‹ค. ใ€Šํ•ด๋ฆฌ ํฌํ„ฐ์™€ ์•„์ฆˆ์นด๋ฐ˜์˜ ์ฃ„์ˆ˜ใ€‹์—์„œ ๋ฐ์Šค ์ดํ„ฐ๋Š” ์•„์ฆˆ์นด๋ฐ˜์˜ ์ฃ„์ˆ˜๋กœ ๋“ฑ์žฅํ•˜์—ฌ ํ•ด๋ฆฌ ํฌํ„ฐ์—๊ฒŒ ๋ณต์ˆ˜๋ฅผ ํ•˜๊ณ ์ž ํ•˜์ง€๋งŒ ํ•ด๋ฆฌ ํฌํ„ฐ์™€ ๋ก  ์œ„์ฆ๋ฆฌ์˜ ๋ฐœ ๋น ๋ฅธ ๋Œ€์ฒ˜๋กœ ์‹คํŒจํ•˜๊ณ  ํ•ด๋ฆฌ ํฌํ„ฐ์™€ ๋ก  ์œ„์ฆ๋ฆฌ๋Š” ๋ฐ์Šค ์ดํ„ฐ๋ฅผ ์ฃฝ์ด๋ ค๊ณ  ํ•˜์ง€๋งŒ ์‹คํŒจํ•˜๊ณ  ๊ทธ ๊ฒฐ๊ณผ ๋ฐ์Šค ์ดํ„ฐ๋Š” ๋‹ค์‹œ ๊ธฐ์ €์Šน์— ๋ด‰์ธ๋œ๋‹ค. ใ€Šํ•ด๋ฆฌ ํฌํ„ฐ์™€ ์ฃฝ์Œ์˜ ์„ฑ๋ฌผใ€‹์—์„œ ๋ฐ์Šค ์ดํ„ฐ๋Š” ๋ถ€ํ™œํ•˜์—ฌ ์ฃฝ์Œ์˜ ์„ฑ๋ฌผ 1๋ถ€์— ๋“ฑ์žฅํ•˜์˜€์œผ๋ฉฐ, ํ•ด๋ฆฌ ํฌํ„ฐ์™€ ์ฃฝ์Œ์˜ ์„ฑ๋ฌผ 2๋ถ€์—์„œ ๋ฐ์Šค ์ดํ„ฐ๋Š” ์ฃฝ์Œ์˜ ์„ฑ๋ฌผ 2๋ถ€์˜ ๋‚ด์šฉ์„ ๋ฐฉํ•ดํ•˜๋Š” ๊ฐ„์ฒฉ ์—ญํ• ์„ ํ•œ๋‹ค. ๋ฐ์Šค ์ดํ„ฐ๋Š” ์˜ํ™” ํ•ด๋ฆฌํฌํ„ฐ์™€ ์ฃฝ์Œ์˜ ์„ฑ๋ฌผ 1๋ถ€์—์„œ ๋ฐ์Šค ์ดํ„ฐ์˜ ์—ญํ• ์„ ๋งก์€ ๋ฐฐ์šฐ ์Šคํ‹ฐ๋ธ ํ”ผ์นด๋“œ๊ฐ€ ์—ฐ๊ธฐํ•œ๋‹ค.

Usage

After downloading from GitHub, please install as follows:

git clone https://github.com/davidkim205/komt
cd komt
pip install -r lora/requirements_lora.txt
import torch

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from transformers import StoppingCriteria, StoppingCriteriaList
from transformers import TextStreamer, GenerationConfig
from peft import PeftModel, PeftConfig

class LocalStoppingCriteria(StoppingCriteria):

    def __init__(self, tokenizer, stop_words = []):
        super().__init__()

        stops = [tokenizer(stop_word, return_tensors='pt', add_special_tokens = False)['input_ids'].squeeze() for stop_word in stop_words]
        print('stop_words', stop_words)
        print('stop_words_ids', stops)
        self.stop_words = stop_words
        self.stops = [stop.cuda() for stop in stops]
        self.tokenizer = tokenizer
    def _compare_token(self, input_ids):
        for stop in self.stops:
            if len(stop.size()) != 1:
                continue
            stop_len = len(stop)
            if torch.all((stop == input_ids[0][-stop_len:])).item():
                return True

        return False
    def _compare_decode(self, input_ids):
        input_str = self.tokenizer.decode(input_ids[0])
        for stop_word in self.stop_words:
            if input_str.endswith(stop_word):
                return True
        return False

    def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor):
        input_str = self.tokenizer.decode(input_ids[0])
        for stop_word in self.stop_words:
            if input_str.endswith(stop_word):
                return True
        return False

#
# config
peft_model_name = 'davidkim205/komt-Llama-2-7b-chat-hf-lora'
model_name = 'davidkim205/komt-Llama-2-7b-chat-hf'
instruction_prefix = "### instruction: "
input_prefix = "### input: "
answer_prefix = "### Response: "
endoftext = "<|end|>"
stop_words = [endoftext, '<s>', '###']
generation_config = GenerationConfig(
    temperature=0.9,
    top_p=0.7,
    top_k=100,
    max_new_tokens=2048,
    early_stopping=True,
    do_sample=True,
)
#
# create model
config = PeftConfig.from_pretrained(peft_model_name)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)
model = AutoModelForCausalLM.from_pretrained(model_name, quantization_config=bnb_config,
                                             device_map="auto")
model = PeftModel.from_pretrained(model, peft_model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
stopping_criteria = StoppingCriteriaList([LocalStoppingCriteria(tokenizer=tokenizer, stop_words=stop_words)])
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
model.eval()

#
# generate
prompt = f"### instruction: ์˜ํ™” ํ•ด๋ฆฌํฌํ„ฐ ์‹œ๋ฆฌ์ฆˆ ๋ฐฐ๊ธ‰์‚ฌ๊ฐ€ ์–ด๋””์•ผ\n\n### Response:"
gened = model.generate(
    **tokenizer(
        prompt,
        return_tensors='pt',
        return_token_type_ids=False
    ).to('cuda'),
    generation_config=generation_config,
    eos_token_id=model.config.eos_token_id,
    stopping_criteria=stopping_criteria,
    streamer=streamer
)
output_text = tokenizer.decode(gened[0], skip_special_tokens=True)

print('--------------------')
print(output_text)

response: ์›Œ๋„ˆ ๋ธŒ๋ผ๋”์Šค\n๊ทธ ์™ธ์—๋„ ํ‚ค๋„ค์Šค ๊ฐ™์€ ์ž‘์€ ์—ญํ• ์„ ๋งก์€ ์‚ฌ๋žŒ๋“ค๋„ ์žˆ์—ˆ๋‹ค. ๋ฐ์ด๋น„๋“œ ์•Œ๋ฐ”๋ ˆ์ฆˆ๋Š” ๋ณผ๋“œ๋ชจํŠธ ๊ตฐ๋‹จ์˜ ๋ณด๋กœ๋ฏธ๋ฅด 7๊ฐœ ํ•™๊ต ์ค‘ ํ•˜๋‚˜์ธ ํ•„๋ฆฝ์Šค๋ถ€๋ฅด๊ทธ ๋ณด๋กœ๋ฏธ๋ฅด 7๊ฐœ ํ•™๊ต์˜ ๊ต์žฅ์ด๋ฉฐ, ํ•„๋ฆฌ์šฐ์Šค ํ”Œ๋ฆฌํŠธ์šฐ๋“œ๋Š” ํ•„๋ฆฝ์Šค๋ถ€๋ฅด๊ทธ ๋ณด๋กœ๋ฏธ๋ฅด 7๊ฐœ ํ•™๊ต์˜ ๊ต์‚ฌ์ด๋‹ค. ํ•„๋ฆฝ์Šค๋ถ€๋ฅด๊ทธ๋Š” ๋ณด๋กœ๋ฏธ๋ฅด 7๊ฐœ ํ•™๊ต ์ค‘ ํ•˜๋‚˜๋กœ, ๋ณด๋กœ๋ฏธ๋ฅด ์ „์„ค์— ๋”ฐ๋ฅด๋ฉด ๋ณด๋กœ๋ฏธ๋ฅด 7๊ฐœ ํ•™๊ต์˜ ๊ต์žฅ๋“ค์ด ์ฃฝ์œผ๋ฉด ์„ธ๊ณ„๋ฅผ ๋ฉธ๋ง์‹œํ‚จ๋‹ค๋Š” ์ ์„ ์•Œ๊ณ  ์žˆ๋‹ค. ใ€Šํ•ด๋ฆฌ ํฌํ„ฐ์™€ ํ˜ผํ˜ˆ ์™•์žใ€‹์—์„œ ๋ฐ์Šค ์ดํ„ฐ๋Š” ํ•ด๋ฆฌ ํฌํ„ฐ๋ฅผ ์ฃฝ์ด๋ ค๊ณ  ํ•˜์ง€๋งŒ ์‹คํŒจํ•˜๊ณ  ํ•ด๋ฆฌ ํฌํ„ฐ๋Š” ๋ฐ์Šค ์ดํ„ฐ๋ฅผ ์ฃฝ์ด๋ ค๊ณ  ํ•˜์ง€๋งŒ ๋˜ ์‹คํŒจํ•œ๋‹ค. ใ€Šํ•ด๋ฆฌ ํฌํ„ฐ์™€ ์•„์ฆˆ์นด๋ฐ˜์˜ ์ฃ„์ˆ˜ใ€‹์—์„œ ๋ฐ์Šค ์ดํ„ฐ๋Š” ์•„์ฆˆ์นด๋ฐ˜์˜ ์ฃ„์ˆ˜๋กœ ๋“ฑ์žฅํ•˜์—ฌ ํ•ด๋ฆฌ ํฌํ„ฐ์—๊ฒŒ ๋ณต์ˆ˜๋ฅผ ํ•˜๊ณ ์ž ํ•˜์ง€๋งŒ ํ•ด๋ฆฌ ํฌํ„ฐ์™€ ๋ก  ์œ„์ฆ๋ฆฌ์˜ ๋ฐœ ๋น ๋ฅธ ๋Œ€์ฒ˜๋กœ ์‹คํŒจํ•˜๊ณ  ํ•ด๋ฆฌ ํฌํ„ฐ์™€ ๋ก  ์œ„์ฆ๋ฆฌ๋Š” ๋ฐ์Šค ์ดํ„ฐ๋ฅผ ์ฃฝ์ด๋ ค๊ณ  ํ•˜์ง€๋งŒ ์‹คํŒจํ•˜๊ณ  ๊ทธ ๊ฒฐ๊ณผ ๋ฐ์Šค ์ดํ„ฐ๋Š” ๋‹ค์‹œ ๊ธฐ์ €์Šน์— ๋ด‰์ธ๋œ๋‹ค. ใ€Šํ•ด๋ฆฌ ํฌํ„ฐ์™€ ์ฃฝ์Œ์˜ ์„ฑ๋ฌผใ€‹์—์„œ ๋ฐ์Šค ์ดํ„ฐ๋Š” ๋ถ€ํ™œํ•˜์—ฌ ์ฃฝ์Œ์˜ ์„ฑ๋ฌผ 1๋ถ€์— ๋“ฑ์žฅํ•˜์˜€์œผ๋ฉฐ, ํ•ด๋ฆฌ ํฌํ„ฐ์™€ ์ฃฝ์Œ์˜ ์„ฑ๋ฌผ 2๋ถ€์—์„œ ๋ฐ์Šค ์ดํ„ฐ๋Š” ์ฃฝ์Œ์˜ ์„ฑ๋ฌผ 2๋ถ€์˜ ๋‚ด์šฉ์„ ๋ฐฉํ•ดํ•˜๋Š” ๊ฐ„์ฒฉ ์—ญํ• ์„ ํ•œ๋‹ค. ๋ฐ์Šค ์ดํ„ฐ๋Š” ์˜ํ™” ํ•ด๋ฆฌํฌํ„ฐ์™€ ์ฃฝ์Œ์˜ ์„ฑ๋ฌผ 1๋ถ€์—์„œ ๋ฐ์Šค ์ดํ„ฐ์˜ ์—ญํ• ์„ ๋งก์€ ๋ฐฐ์šฐ ์Šคํ‹ฐ๋ธ ํ”ผ์นด๋“œ๊ฐ€ ์—ฐ๊ธฐํ•œ๋‹ค. ## Hardware and Software - nvidia driver : 535.54.03 - CUDA Version: 12.2

Training procedure

The following bitsandbytes quantization config was used during training:

  • load_in_8bit: False
  • load_in_4bit: True
  • llm_int8_threshold: 6.0
  • llm_int8_skip_modules: None
  • llm_int8_enable_fp32_cpu_offload: False
  • llm_int8_has_fp16_weight: False
  • bnb_4bit_quant_type: nf4
  • bnb_4bit_use_double_quant: True
  • bnb_4bit_compute_dtype: bfloat16

Framework versions

  • PEFT 0.5.0.dev0
Downloads last month
2
Inference Examples
Inference API (serverless) has been turned off for this model.