|
--- |
|
library_name: peft |
|
--- |
|
|
|
# WIP |
|
|
|
## 1. 사용절차 |
|
|
|
* Install model and PEFT parameters |
|
|
|
``` |
|
import torch |
|
from peft import PeftModel, PeftConfig |
|
from transformers import AutoTokenizer, AutoModelForCausalLM, GPTQConfig |
|
|
|
model_id = "TheBloke/WizardLM-13B-V1.2-GPTQ" |
|
|
|
config = PeftConfig.from_pretrained("a2ran/GPTeacher_ko_llama2_13B") |
|
tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True) |
|
quantization_config_loading = GPTQConfig(bits=4, disable_exllama=True) |
|
|
|
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=quantization_config_loading, |
|
torch_dtype=torch.float16, device_map="auto") |
|
model = PeftModel.from_pretrained(model, "a2ran/GPTeacher_ko_llama2_13B") |
|
``` |
|
|
|
* How to Generate Tokens |
|
|
|
``` |
|
from transformers import TextStreamer |
|
|
|
streamer = TextStreamer(tokenizer) |
|
|
|
# your input sentence가 들어갈 곳 |
|
input = """ |
|
### input @ 미국의 행정시스템에 대해 설명해줘.\n\n### response @""" |
|
|
|
output = tokenizer.decode(model.cuda().generate( |
|
**tokenizer( |
|
input, |
|
return_tensors='pt', |
|
).to(0), |
|
max_new_tokens = 2048, |
|
temperature = 1.2, |
|
top_p = 0.7, |
|
early_stopping = True, |
|
eos_token_id = 2, |
|
do_sample = True, |
|
repetition_penalty = 1.1, |
|
streamer = streamer |
|
)[0]).replace(input+" ", "") |
|
``` |
|
|
|
## 2. Training procedure |
|
|
|
|
|
The following `bitsandbytes` quantization config was used during training: |
|
- quant_method: gptq |
|
- bits: 4 |
|
- tokenizer: None |
|
- dataset: None |
|
- group_size: 128 |
|
- damp_percent: 0.1 |
|
- desc_act: False |
|
- sym: True |
|
- true_sequential: True |
|
- use_cuda_fp16: False |
|
- model_seqlen: None |
|
- block_name_to_quantize: None |
|
- module_name_preceding_first_block: None |
|
- batch_size: 1 |
|
- pad_token_id: None |
|
- disable_exllama: True |
|
- max_input_length: None |
|
### Framework versions |
|
|
|
|
|
- PEFT 0.6.0.dev0 |
|
|