metadata
license: other
language:
- en
pipeline_tag: text2text-generation
tags:
- alpaca
- llama
- chat
- gpt4
inference: false
This is a 4bit 128g GPTQ of chansung's gpt4-alpaca-lora-13b.
How to easily download and use this model in text-generation-webui
Open the text-generation-webui UI as normal.
- Click the Model tab.
- Under Download custom model or LoRA, enter
TheBloke/gpt4-alpaca-lora-13B-GPTQ-4bit-128g
. - Click Download.
- Wait until it says it's finished downloading.
- Click the Refresh icon next to Model in the top left.
- In the Model drop-down: choose the model you just downloaded,
gpt4-alpaca-lora-13B-GPTQ-4bit-128g
. - If you see an error in the bottom right, ignore it - it's temporary.
- Check that the
GPTQ parameters
are correct on the right:Bits = 4
,Groupsize = 128
,model_type = Llama
- Click Save settings for this model in the top right.
- Click Reload the Model in the top right.
- Once it says it's loaded, click the Text Generation tab and enter a prompt!
Command to create was:
CUDA_VISIBLE_DEVICES=0 python3 llama.py /content/gpt4-alpaca-lora-13B-HF c4 --wbits 4 --true-sequential --act-order --groupsize 128 --save_safetensors /content/gpt4-alpaca-lora-13B-GPTQ-4bit-128g.safetensors
Command to clone the latest Triton GPTQ-for-LLaMa repo for inference using llama_inference.py
, or in text-generation-webui
:
# Clone text-generation-webui, if you don't already have it
git clone https://github.com/oobabooga/text-generation-webui
# Make a repositories directory
mkdir -p text-generation-webui/repositories
cd text-generation-webui/repositories
# Clone the latest GPTQ-for-LLaMa code inside text-generation-webui
git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa
There is also a no-act-order.safetensors
file which will work with oobabooga's fork of GPTQ-for-LLaMa; it does not require the latest GPTQ code.
Original model card is below
This repository comes with LoRA checkpoint to make LLaMA into a chatbot like language model. The checkpoint is the output of instruction following fine-tuning process with the following settings on 8xA100(40G) DGX system.
- Training script: borrowed from the official Alpaca-LoRA implementation
- Training script:
python finetune.py \
--base_model='decapoda-research/llama-30b-hf' \
--data_path='alpaca_data_gpt4.json' \
--num_epochs=10 \
--cutoff_len=512 \
--group_by_length \
--output_dir='./gpt4-alpaca-lora-30b' \
--lora_target_modules='[q_proj,k_proj,v_proj,o_proj]' \
--lora_r=16 \
--batch_size=... \
--micro_batch_size=...
You can find how the training went from W&B report here.