|
--- |
|
license: other |
|
datasets: |
|
- georgesung/wizard_vicuna_70k_unfiltered |
|
--- |
|
|
|
# Overview |
|
Fine-tuned [Llama-3 8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) with an uncensored/unfiltered Wizard-Vicuna conversation dataset. |
|
Used QLoRA for fine-tuning. |
|
|
|
The model here includes the fp32 HuggingFace version, plus a quantized 4-bit q4_0 [gguf version](https://huggingface.co/georgesung/llama3_8b_chat_uncensored/resolve/main/llama3_8b_chat_uncensored_q4_0.gguf?download=true). |
|
|
|
# Prompt style |
|
The model was trained with the following prompt style: |
|
``` |
|
### HUMAN: |
|
Hello |
|
|
|
### RESPONSE: |
|
Hi, how are you? |
|
|
|
### HUMAN: |
|
I'm fine. |
|
|
|
### RESPONSE: |
|
How can I help you? |
|
... |
|
``` |
|
|
|
# Training code |
|
Code used to train the model is available [here](https://github.com/georgesung/llm_qlora). |
|
|
|
To reproduce the results: |
|
``` |
|
git clone https://github.com/georgesung/llm_qlora |
|
cd llm_qlora |
|
pip install -r requirements.txt |
|
python train.py configs/llama3_8b_chat_uncensored.yaml |
|
``` |
|
|
|
# Fine-tuning guide |
|
https://georgesung.github.io/ai/qlora-ift/ |
|
|
|
# Ollama inference |
|
First, install [Ollama](https://ollama.com/). Based on instructions [here](https://github.com/ollama/ollama/blob/main/README.md#import-from-gguf), run the following: |
|
``` |
|
cd $MODEL_DIR_OF_CHOICE |
|
wget https://huggingface.co/georgesung/llama3_8b_chat_uncensored/resolve/main/llama3_8b_chat_uncensored_q4_0.gguf |
|
``` |
|
|
|
Create a file called `llama3-uncensored.modelfile` with the following: |
|
``` |
|
FROM ./llama3_8b_chat_uncensored_q4_0.gguf |
|
TEMPLATE """{{ .System }} |
|
|
|
### HUMAN: |
|
{{ .Prompt }} |
|
|
|
### RESPONSE: |
|
""" |
|
PARAMETER stop "### HUMAN:" |
|
PARAMETER stop "### RESPONSE:" |
|
``` |
|
|
|
Then run: |
|
``` |
|
ollama create llama3-uncensored -f llama3-uncensored.modelfile |
|
ollama run llama3-uncensored |
|
``` |
|
|