--- library_name: peft --- # WIP ## 1. 사용절차 * Install model and PEFT parameters ``` import torch from peft import PeftModel, PeftConfig from transformers import AutoTokenizer, AutoModelForCausalLM, GPTQConfig model_id = "TheBloke/WizardLM-13B-V1.2-GPTQ" config = PeftConfig.from_pretrained("a2ran/GPTeacher_ko_llama2_13B") tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True) quantization_config_loading = GPTQConfig(bits=4, disable_exllama=True) model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=quantization_config_loading, torch_dtype=torch.float16, device_map="auto") model = PeftModel.from_pretrained(model, "a2ran/GPTeacher_ko_llama2_13B") ``` * How to Generate Tokens ``` from transformers import TextStreamer streamer = TextStreamer(tokenizer) # your input sentence가 들어갈 곳 input = """ ### input @ 미국의 행정시스템에 대해 설명해줘.\n\n### response @""" output = tokenizer.decode(model.cuda().generate( **tokenizer( input, return_tensors='pt', ).to(0), max_new_tokens = 2048, temperature = 1.2, top_p = 0.7, early_stopping = True, eos_token_id = 2, do_sample = True, repetition_penalty = 1.1, streamer = streamer )[0]).replace(input+" ", "") ``` ## 2. Training procedure The following `bitsandbytes` quantization config was used during training: - quant_method: gptq - bits: 4 - tokenizer: None - dataset: None - group_size: 128 - damp_percent: 0.1 - desc_act: False - sym: True - true_sequential: True - use_cuda_fp16: False - model_seqlen: None - block_name_to_quantize: None - module_name_preceding_first_block: None - batch_size: 1 - pad_token_id: None - disable_exllama: True - max_input_length: None ### Framework versions - PEFT 0.6.0.dev0