license: other
library_name: transformers
pipeline_tag: text-generation
datasets:
- RyokoAI/ShareGPT52K
- Hello-SimpleAI/HC3
tags:
- koala
- ShareGPT
- llama
- gptq
inference: false
Koala: A Dialogue Model for Academic Research
This repo contains the weights of the Koala 13B model produced at Berkeley. It is the result of combining the diffs from https://huggingface.co/young-geng/koala with the original Llama 13B model.
This version has then been quantized to 4-bit and 5-bit GGML for use with llama.cpp.
My Koala repos
I have the following Koala model repositories available:
13B models:
- Unquantized 13B model in HF format
- GPTQ quantized 4bit 13B model in
pt
andsafetensors
formats - 4bit and 5bit models in GGML format for
llama.cpp
7B models:
- Unquantized 7B model in HF format
- Unquantized 7B model in GGML format for llama.cpp
- GPTQ quantized 4bit 7B model in
pt
andsafetensors
formats - 4bit and 5bit models in GGML format for
llama.cpp
THE FILES IN MAIN BRANCH REQUIRES LATEST LLAMA.CPP (May 19th 2023 - commit 2d5db48)!
llama.cpp recently made another breaking change to its quantisation methods - https://github.com/ggerganov/llama.cpp/pull/1508
I have quantised the GGML files in this repo with the latest version. Therefore you will require llama.cpp compiled on May 19th or later (commit 2d5db48
or later) to use them.
For files compatible with the previous version of llama.cpp, please see branch previous_llama_ggmlv2
.
How to run in llama.cpp
I use the following command line; adjust for your tastes and needs:
./main -t 18 -m koala-13B-4bit-128g.GGML.bin --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "BEGINNING OF CONVERSATION:
USER: <PROMPT GOES HERE>
GPT:"
Change -t 18
to the number of physical CPU cores you have. For example if your system has 8 cores, 16 threads, use -t 8
.
You will require 16GB or more RAM to run this model without swapping.
How the Koala delta weights were merged
The Koala delta weights were originally merged using the following commands, producing koala-13B-HF:
git clone https://github.com/young-geng/EasyLM
git clone https://huggingface.co/TheBloke/llama-13b
mkdir koala_diffs && cd koala_diffs && wget https://huggingface.co/young-geng/koala/resolve/main/koala_13b_diff_v2
cd EasyLM
PYTHON_PATH="${PWD}:$PYTHONPATH" python \
-m EasyLM.models.llama.convert_torch_to_easylm \
--checkpoint_dir=/content/llama-13b \
--output_file=/content/llama-13b-LM \
--streaming=True
PYTHON_PATH="${PWD}:$PYTHONPATH" python \
-m EasyLM.scripts.diff_checkpoint --recover_diff=True \
--load_base_checkpoint='params::/content/llama-13b-LM' \
--load_target_checkpoint='params::/content/koala_diffs/koala_13b_diff_v2' \
--output_file=/content/koala_13b.diff.weights \
--streaming=True
PYTHON_PATH="${PWD}:$PYTHONPATH" python \
-m EasyLM.models.llama.convert_easylm_to_hf --model_size=13b \
--output_dir=/content/koala-13B-HF \
--load_checkpoint='params::/content/koala_13b.diff.weights' \
--tokenizer_path=/content/llama-13b/tokenizer.model
Further info
Check out the following links to learn more about the Berkeley Koala model.
- Blog post
- Online demo
- EasyLM: training and serving framework on GitHub
- Documentation for running Koala locally
License
The model weights are intended for academic research only, subject to the model License of LLaMA, Terms of Use of the data generated by OpenAI, and Privacy Practices of ShareGPT. Any other usage of the model weights, including but not limited to commercial usage, is strictly prohibited.