werty1248's picture
Update README.md
e6fbc8f verified
metadata
license: apache-2.0
datasets:
  - werty1248/Korean-1930-Novel-Scene-Summarize
language:
  - ko
pipeline_tag: text-generation

Model Card

  • ์š”์•ฝ ์‹œ๋‚˜๋ฆฌ์˜ค ๊ธฐ๋ฐ˜ ์†Œ์„ค ์ƒ์„ฑ ๋ชจ๋ธ
  • werty1248/Korean-1930-Novel-Scene-Summarize ์ž‘์—…์˜ ํšจ๊ณผ ํ™•์ธ์šฉ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.

Training Details

Dataset

Preprocessing

  • system prompt์™€ ํ•จ๊ป˜ ์†Œ์„ค์˜ ์ฒซ ๋ฌธ๋‹จ์„ ์ œ๊ณต

  • ์ดํ›„ user๊ฐ€ ์‹œ๋‚˜๋ฆฌ์˜ค(50%) ๋˜๋Š” ์ด๋ฒคํŠธ(50%)๋ฅผ ์ œ๊ณตํ•˜๋ฉด assistant๊ฐ€ ๋‹ต๋ณ€์„ ์ƒ์„ฑ

  • 3-shot multi-turn ๋ฐ์ดํ„ฐ ํ˜•์‹์œผ๋กœ ๋ณ€ํ™˜ํ•˜์—ฌ ํ•™์Šต

  • ํ”„๋กฌํ”„ํŠธ ์˜ˆ์‹œ๋Š” ์•„๋ž˜์— ์žˆ์Šต๋‹ˆ๋‹ค.

  • Axolotl(full config๋Š” ์•„๋ž˜์— ์žˆ์Šต๋‹ˆ๋‹ค)

    • LoRA: (rank=32, alpha=128)
    • NefTune_alpha: 5
    • total_batch_size: 8
    • num_epoch: 3
  • 1xA100์—์„œ ์•ฝ 8์‹œ๊ฐ„ ํ•™์Šต

Template & How to use

  • ์œ ์ € instruction์„ ๋ฌด์‹œํ•˜๋Š” ๊ฒฝํ–ฅ ์žˆ์Œ
  • ํ•œ์ž/์˜์–ด ๋‹จ์–ด๊ฐ€ ์„ž์ด๋Š” ํ˜„์ƒ ์™„ํ™”๋จ
  • ํ•œ๊ตญ์–ด ๋Šฅ๋ ฅ์ด ๋” ๋–จ์–ด์ง„ ๊ฒƒ ๊ฐ™์Œ

Input(๋ˆˆ๋ฌผ์„ ๋งˆ์‹œ๋Š” ์ƒˆ ๋„์ž…๋ถ€)

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

tokenizer = AutoTokenizer.from_pretrained("werty1248/Qwen2-7B-Korean-1930-Novel-sft")
model = AutoModelForCausalLM.from_pretrained("werty1248/Qwen2-7B-Korean-1930-Novel-sft", torch_dtype=torch.bfloat16).to('cuda')

system_prompt = """๋‹น์‹ ์€ ์†Œ์„ค ์ž‘์„ฑ ์–ด์‹œ์Šคํ„ดํŠธ์ž…๋‹ˆ๋‹ค. ๋‹น์‹ ์˜ ์ž„๋ฌด๋Š” ์œ ์ €์˜ ๊ฐ€์ด๋“œ์— ๋”ฐ๋ผ 1900~1940๋…„๋Œ€ ๊ทผ๋Œ€ ํ•œ๊ตญ ์†Œ์„ค์„ ์ž‘์„ฑํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.
- ์ฃผ์–ด์ง„ ์‹œ๋‚˜๋ฆฌ์˜ค ์š”์•ฝ์„ ํ™•์ธํ•˜๊ณ , ์ด์ „ ๋Œ€ํ™”๋ฅผ ์ฐธ๊ณ ํ•˜์—ฌ ํ”Œ๋กฏ์„ ๊ตฌ์„ฑํ•˜์‹ญ์‹œ์˜ค.
- ํ’๋ถ€ํ•œ ํ•œ๊ตญ์–ด ํ‘œํ˜„ ๋ฐ ๋Œ€ํ™”๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ฐฝ์˜์ ์œผ๋กœ ์งง์€ ์”ฌ์„ ์™„์„ฑํ•˜์„ธ์š”.
- ์”ฌ์˜ ๋Œ€์‚ฌ์— ๊ทผ๋Œ€ ํ•œ๊ตญ ํŠน์œ ์˜ ํ‘œํ˜„, ์–ดํœ˜, ์‚ฌํˆฌ๋ฆฌ, ์กด๋Œ“๋ง๊ณผ ๋ฐ˜๋ง์„ ๋ฐ˜์˜ํ•˜์‹ญ์‹œ์˜ค.
- ์”ฌ์˜ ์ฃผ์š” ์‚ฌ๊ฑด์— ๊ทผ๋Œ€ ํ•œ๊ตญ์˜ ์—ญ์‚ฌ์ , ๊ธฐ์ˆ ์  ํŠน์„ฑ์„ ๋ฐ˜์˜ํ•˜์‹ญ์‹œ์˜ค.
- ์”ฌ์€ 5~10๋ฌธ์žฅ์œผ๋กœ ๊ตฌ์„ฑํ•˜์„ธ์š”.
"""

first_message = """### ์ฒซ ๋ฌธ๋‹จ
ํ•˜๋Š˜์„ ๋ถˆ์‚ฌ๋ฅด๋˜ ์šฉ์˜ ๋…ธ์—ฌ์›€๋„ ์žŠํ˜€์ง€๊ณ 
์™•์ž๋“ค์˜ ์„๋น„๋„ ์‚ฌํ†  ์†์— ๋ฌปํ˜€๋ฒ„๋ฆฐ
๊ทธ๋ฆฌ๊ณ  ๊ทธ๋Ÿฐ ๊ฒƒ๋“ค์— ๋ˆ„๊ตฌ๋„ ์‹ ๊ฒฝ์“ฐ์ง€ ์•Š๋Š”
์ƒ์กด์ด ์ฒœ๋ฐ•ํ•œ ๋†๋‹ด์ด ๋œ ์‹œ๋Œ€์—

ํ•œ ๋‚จ์ž๊ฐ€ ์‚ฌ๋ง‰์„ ๊ฑท๊ณ  ์žˆ์—ˆ๋‹ค.
"""

scenario = """### ๋“ฑ์žฅ์ธ๋ฌผ
์—ฌ๊ด€ ์ฃผ์ธ, ์ผ€์ด๊ฑด ๋“œ๋ผ์นด

### ์‹œ๋‚˜๋ฆฌ์˜ค
์—ฌ๊ด€ ์ฃผ์ธ์ด ๋ฌธ๋“ ์ฐฝ๋ฐ–์„ ๋ฐ”๋ผ๋ณธ๋‹ค. ์—ฌ๊ด€ ์ฃผ๋ณ€์€ ์˜จํ†ต ์‚ฌ๋ง‰์ด๋‹ค. ์ง€ํ‰์„  ์ € ๋ฉ€๋ฆฌ์„œ ํ•œ ์‚ฌ๋‚ด๊ฐ€ ํผ์ง€๋ง‰ํ•œ ์ž๋ฃจ๋ฅผ ์งˆ์งˆ ๋Œ๋ฉด์„œ ์—ฌ๊ด€ ๋ฐฉํ–ฅ์œผ๋กœ ๊ฑธ์–ด์˜จ๋‹ค.
"""
messages = [{'role':'system', 'content':system_prompt + first_message},
 {'role':'user', 'content':scenario},]

pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, max_new_tokens=512)

prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipe(
    prompt,
    do_sample=True,
    temperature=0.8,
    top_k=50,
    top_p=0.95,
    add_special_tokens=True
)
print(outputs[0]["generated_text"][len(prompt):])

Result

ํ•˜๋Š˜์€ ๊นจ๋—ํ•˜์˜€๋‹ค. ๋ฐ”๋žŒ๋„ ์˜์ด์ง€ ์•Š์•˜๋‹ค. ๋ฐ”์œ„ ์œ„์— ๊ตด๋Ÿฌ์ ธ ์žˆ๋Š” ํฌ์˜น์ด ํ•œ ๋ฉ๋งŒํ•˜๊ณ  ํ‘ธ๋ฅธ ํ–‡๋น› ์•„๋ž˜ ์ž˜ ๋ณด์˜€๋‹ค.

๊ฑฐ๊ธฐ๋Š” ๋งˆ์นจ ํ•œ ์ฑ„์˜ ์—ฌ๊ด€์ด ์ง€์ฒด๋กœ ๊ฝ‚ํ˜€ ์žˆ์—ˆ๋‹ค. ์ขŒ์šฐ ๋ฉ€๋ฆฌ ๋ฐ”์œ„๊ฐ€ ์„ฐ์„ ๋ฟ์ด๋‹ค. ๋‚˜๋ฌด๋„ ์ˆ˜๋ชฉ๋„ ์—†๋Š” ์˜จ ์ฒœ์ง€๊ฐ€ ํ›คํžˆ ์—ฌ๊ด€ ์˜†์„ ์—์›Œ์‹ธ๊ณ  ์žˆ๋Š” ๊ฒƒ์ด๋‹ค. ์–ผ๋ฅธ ๋ณด๋ฉด ํ•œ ์กฐ๊ฐ ์„์ถ• ์† ๊ฐ™๊ธฐ๋„ ํ•˜์ง€๋งŒ ๊ฑด์ถ•์ด๋ผ๋Š๋‹ˆ ๋ณด๋‹ค๋Š” ๋„ํšŒ์— ๋•…์„ ๋‚ด์–ด๋†“๊ณ  ๋‹จ์ˆœํ•œ ๊ฐ„ํŒ ํ•˜๋‚˜๋ฅผ ๊ฑธ์–ด์„œ ๊ทธ๋Œ€๋กœ ๋ฒŒ์ดํ•˜๋Š” โ€˜์—ฌ๊ด€โ€™์ด๋‹ค. ๋ช‡๋ช‡ ์ข์€ ๊ธธ๋งŒ ๋“ค์–ด์™”๋‹ค๊ฐ€ ์‚ฌ๋ผ์กŒ์„ ๋ฟ ํ•œ ๊ธธ๋„ ์•ˆ ๋šซ๋ ค์ง„ ๊ทธ ์ž๋ฆฌ์— ๊ทธ๋Œ€๋กœ ์ง€ํƒฑ๋˜์–ด ์žˆ๋Š” ์—ฌ๊ด€์ด๋‹ค. ๊ธธ์— ๋‚˜ํƒ€๋‚˜๋Š” ์‚ฌ๋žŒ์€ ์—†๋‹ค. ์ขŒ์šฐ๋กœ ๋ฐ”์œ„๋ฅผ ๋ผ๊ณ  ๋Š˜ ๋น„์–ด ์žˆ๋Š” ์‚ฐ ์ค‘์—์„œ ๊ฐ„๊ฐ„์ด ๊ณ ๊ธฐ ๊ตฝ๋Š” ๋‹ฅ์ด์™€ ์‚ฏ๋ฐ”๊ตฌ๋‹ˆ๋ฉฐ ๊ณ ๊นƒ๊ฐ€๋งˆ๋ฅผ ๊ฑธ๊ณ  ํ–‰์ธ๋งŒ ๋ช‡ ๋งˆ๋ฆฌ์”ฉ ์ง€๋‚˜๊ฐ€๊ณ  ํ›คํ•˜๊ฒŒ ์„ ๋ฐ”๋‹ฅ์— ํ–‡๋น›๋งŒ ๋น„์น˜๊ณ  ํ•˜๋Š” ์“ธ์“ธํ•œ ์—ฌ๊ด€์ด๋‹ค. ๊ทธ๊ฒƒ์ด ๋ฃธ_service๋ผ๋Š” ์„ธ๊ณ„์™€๋„ ๋‹ฌ๋ฆฌ ์ฃผ์ธ๋„ ์—†๊ณ  ์‚ฌ์žฅ๋„ ์—†๊ณ  ์†๋„ ์—†๊ณ  ํ•œ ๊ฐ„ํŒ๋งŒ ๊ฑธ๊ณ  ์ฒœ๋‹น์ฒ˜๋Ÿผ ์™ธ๋กญ๊ฒŒ ์ง€ํ‚ค๊ณ  ์žˆ๋Š” ๋ชจ์–‘์ด๋‹ค.

๊ทธ๋Ÿฐ ์—ฌ๊ด€์—์„œ ์ด์œฝ๊ณ  ์†์ด ํ•œ ์‚ฌ๋žŒ ๋‚˜ํƒ€๋‚ฌ๋‹ค. ์ฐฝ๋ฌธ์ด ์—ด๋ ธ๋‹ค. ๋ฐ–์„ ๋‚ด๋‹ค๋ณด๋Š” ํ•œ ์ ์˜ ๊ฒ€์€ ๊ทธ๋ฆผ์ž๊ฐ€ ์ฐฝ ์•ž์—์„œ ๋๋๋‚ด ์˜ค๋ฏ€๋Ÿฌ์ง„๋‹ค. ๊ทธ๋Ÿฌ๋”๋‹ˆ ๋‹ค์‹œ ํ•œ์ฐธ ํ›„์— ๊ทธ ๊ฒ€์€ ๊ทธ๋ฆผ์ž๋Š” ์›ƒ๋ชฉ์œผ๋กœ ์ฐจ์ฐจ ๋‚˜์™€ ๊ฐ€์ง€๊ณ ๋Š” ๊ทธ๋Œ€๋กœ ์‚ฌ๋ผ์ง€๋Š” ๊ฒƒ์ด๋‹ค.

๊ทธ๋กœ๋ถ€ํ„ฐ ์—ฌ๊ด€์€ ์ƒˆ๋กœ์šด ์ผ์„ ๋‹นํ•˜์˜€๋‹ค. ๋ฐ”์‹œ๋ฆฌ์˜ค ์ž๋ฃจ๋ฅผ ๋“ค๊ณ  ํ•œ ์‚ฌ๋‚˜์ด๊ฐ€ ๊ฑธ์–ด ๋“ค์–ด์˜ค์ง€ ์•Š์•˜๋Š”๊ฐ€. ์–ผ๊ตด์€ ๋จผ๋น›์œผ๋กœ ๋˜‘๋˜‘ํžˆ ๋ชป ๋ณด์ด๋‚˜ ๋ฒŒ์จ ์‚ฌ๋ฉด ์‚ฌ์˜ค ๊ฐ„๋ฐ–์— ์•ˆ ๋˜๋Š” ๊ณณ์— ํผ์งํ•œ ์ž๋ฃจ๋ฅผ

Result of Original model(Qwen/Qwen2-7b-instruct)

"๊ทธ๊ฒŒ ๋ˆ„๊ตฌ๋ƒ? ์™œ ์ด๋ ‡๊ฒŒ ๋จผ ๊ณณ์—์„œ ์ด๊ณณ์œผ๋กœ ์™€?" ์—ฌ๊ด€ ์ฃผ์ธ์ด ๋ฌผ์–ด๋ณด๋ฉฐ ์ฐฝ๋ฌธ์„ ๋ฒŒ๋ฆฌ๊ณ  ๋งํ–ˆ๋‹ค. ๊ทธ๋…€์˜ ๋ชฉ์†Œ๋ฆฌ๋Š” ์‚ฌ๋ง‰์˜ ๊ฑด์กฐํ•จ๊ณผ ํ•จ๊ป˜ ๊ฑฐ์นœ ๋ฐ”๋žŒ์— ๊ฐ€๋ผ์•‰์•˜๋‹ค.

์ผ€์ด๊ฑด ๋“œ๋ผ์นด๋ผ๋Š” ์‚ฌ๋‚ด๋Š” ์ž์‹ ์˜ ์ž๋ฃจ๋ฅผ ๋” ์งˆ์งˆ ๋Œ๋ฉฐ ์—ฌ๊ด€ ์ชฝ์œผ๋กœ ๋‹ค๊ฐ€๊ฐ”๋‹ค. ๊ทธ์˜ ๋ˆˆ์—๋Š” ํฌ๋ฏธํ•˜๊ฒŒ ๋ณด์ด๋Š” ๋จผ์ง€์™€ ๋จผ์ง€๊ฐ€ ๊ฐ์‹ธ๊ณ  ์žˆ๋Š” ํ–‡๋น›์ด ๋น›๋‚˜๋Š” ๋“ฏํ–ˆ๋‹ค. ๊ทธ์˜ ์–ผ๊ตด์—๋Š” ๋•€๋ฐฉ์šธ์ด ๊ตด๋Ÿฌ ๋‚ด๋ ค๊ฐ€๋ฉฐ, ๊ทธ์˜ ์ž…์€ ๋งํ•  ์ค€๋น„๋ฅผ ํ•˜๊ณ  ์žˆ์—ˆ๋‹ค.

"์—ฌ๊ด€ ์ฃผ์ธ๋‹˜, ์ €๋ฅผ ๋ฐ›์•„์ฃผ์‹ค ์ˆ˜ ์žˆ์„๊นŒ์š”? ์ œ๊ฐ€ ์ด๊ณณ์—์„œ ์ž ์‹œ ์‰ฌ๊ณ  ์‹ถ์Šต๋‹ˆ๋‹ค." ๊ทธ์˜ ๋ชฉ์†Œ๋ฆฌ๋Š” ์‚ฌ๋ง‰์˜ ๋ฌด๋””ํ•œ ์กฐ์šฉํ•จ์„ ๊นจ์šฐ๋ฉฐ ์ „ํ•ด์กŒ๋‹ค.

์—ฌ๊ด€ ์ฃผ์ธ์€ ๊ทธ์˜ ์ฒด์œก๋ณต๊ณผ ๋•€ๆฑƒๆฑƒํ•œ ๋ชจ์Šต์„ ๋ณด๋ฉฐ, ์ด ์‚ฌ๋‚ด์˜ ์ƒํ™ฉ์— ๋Œ€ํ•ด ๊ฑฑ์ •์Šค๋Ÿฌ์› ๋‹ค. "๊ทธ๋ž˜, ๋„ค๊ฐ€ ํ•„์š”ํ•˜๋‹ค๊ณ  ๋Š๊ผˆ๋‹ค๋ฉด, ์—ฌ๊ธฐ์„œ ์‰ฌ์–ด๋ผ. ํ•˜์ง€๋งŒ ์™œ ์ด๋Ÿฐ ๊ณณ๊นŒ์ง€ ์™”๋Š”์ง€ ์„ค๋ช…ํ•ด ์ค˜์•ผ ํ•  ๊ฒƒ ๊ฐ™์•„." ๊ทธ๋…€์˜ ๋งํˆฌ๋Š” ์นœ๊ทผํ•˜๋ฉด์„œ๋„ ๋ถ„๋ช…ํ•œ ์ง€์‹œ์˜€๋‹ค.

Others

Axolotl config

base_model: Qwen/Qwen2-7B-Instruct
trust_remote_code: true

load_in_8bit: false
load_in_4bit: false
strict: false

datasets:
  - path: train_data.jsonl
    type: sharegpt
dataset_prepared_path:
val_set_size: 0.05
output_dir: ./outputs/out

sequence_len: 4096
sample_packing: true
eval_sample_packing: true
pad_to_sequence_len: true

adapter: lora
lora_model_dir:
lora_r: 32
lora_alpha: 128
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:

wandb_project:
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:

neftune_noise_alpha: 5
gradient_accumulation_steps: 4
micro_batch_size: 2
num_epochs: 3
optimizer: adamw_torch
lr_scheduler: cosine
learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: false
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true

warmup_steps: 10
evals_per_epoch: 4
saves_per_epoch: 1
debug:
deepspeed:
weight_decay: 0.0
fsdp:
special_tokens: