First commit of GPTQ model
Browse files- README.md +83 -0
- VicUnlocked-30B-GPTQ-4bit.act-order.safetensors +3 -0
- config.json +23 -0
- generation_config.json +7 -0
- special_tokens_map.json +23 -0
- tokenizer.json +0 -0
- tokenizer.model +3 -0
- tokenizer_config.json +33 -0
README.md
ADDED
@@ -0,0 +1,83 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
datasets:
|
3 |
+
- gozfarb/ShareGPT_Vicuna_unfiltered
|
4 |
+
---
|
5 |
+
|
6 |
+
# VicUnlocked-30B-LoRA GPTQ
|
7 |
+
|
8 |
+
This is GPTQ format quantised 4bit models of [Neko Institute of Science's VicUnLocked 30B LoRA](https://huggingface.co/Neko-Institute-of-Science/VicUnLocked-30b-LoRA).
|
9 |
+
|
10 |
+
The files in this repo are the result of merging the above LoRA with the original LLaMA 30B, then quantising to 4bit using [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa).
|
11 |
+
|
12 |
+
## Repositories available
|
13 |
+
|
14 |
+
* [4-bit, 5-bit and 8-bit GGML models for CPU inference](https://huggingface.co/TheBloke/VicUnlocked-30B-LoRA-GGML).
|
15 |
+
* [4bit's GPTQ 4-bit model for GPU inference](https://huggingface.co/TheBloke/VicUnlocked-30B-LoRA-GPTQ).
|
16 |
+
* [float16 HF format model for GPU inference and further conversions](https://huggingface.co/TheBloke/VicUnlocked-30B-LoRA-HF).
|
17 |
+
|
18 |
+
## How to easily download and use this model in text-generation-webui
|
19 |
+
|
20 |
+
Open the text-generation-webui UI as normal.
|
21 |
+
|
22 |
+
1. Click the **Model tab**.
|
23 |
+
2. Under **Download custom model or LoRA**, enter `TheBloke/VicUnlocked-30B-LoRA-GPTQ`.
|
24 |
+
3. Click **Download**.
|
25 |
+
4. Wait until it says it's finished downloading.
|
26 |
+
5. Click the **Refresh** icon next to **Model** in the top left.
|
27 |
+
6. In the **Model drop-down**: choose the model you just downloaded, `VicUnlocked-30B-LoRA-GPTQ`.
|
28 |
+
7. If you see an error in the bottom right, ignore it - it's temporary.
|
29 |
+
8. Fill out the `GPTQ parameters` on the right: `Bits = 4`, `Groupsize = None`, `model_type = Llama`
|
30 |
+
9. Click **Save settings for this model** in the top right.
|
31 |
+
10. Click **Reload the Model** in the top right.
|
32 |
+
11. Once it says it's loaded, click the **Text Generation tab** and enter a prompt!
|
33 |
+
|
34 |
+
## Provided files
|
35 |
+
|
36 |
+
**Compatible file - VicUnlocked-30B-LoRA-GPTQ-4bit.act-order.safetensors**
|
37 |
+
|
38 |
+
In the `main` branch - the default one - you will find `VicUnlocked-30B-LoRA-GPTQ-4bit-128g.compat.no-act-order.safetensors`
|
39 |
+
|
40 |
+
This will work with all versions of GPTQ-for-LLaMa. It has maximum compatibility
|
41 |
+
|
42 |
+
It was created without groupsize so as to minimise VRAM requirements. It is created with the `--act-order` parameter to improve inference quality.
|
43 |
+
|
44 |
+
* `VicUnlocked-30B-LoRA-GPTQ-4bit-128g.compat.no-act-order.safetensors`
|
45 |
+
* Works with all versions of GPTQ-for-LLaMa code, both Triton and CUDA branches
|
46 |
+
* Works with AutoGPTQ.
|
47 |
+
* Works with text-generation-webui one-click-installers
|
48 |
+
* Parameters: Groupsize = None. act-order.
|
49 |
+
* Command used to create the GPTQ:
|
50 |
+
```
|
51 |
+
llama.py /workspace/vicunlocked-30b/HF wikitext2 --wbits 4 --true-sequential --act-order --save_safetensors /workspace/vicunlocked-30b/gptq/VicUnlocked-30B-GPTQ-4bit.act-order.safetensors
|
52 |
+
```
|
53 |
+
|
54 |
+
|
55 |
+
# Original model card
|
56 |
+
|
57 |
+
# Convert tools
|
58 |
+
https://github.com/practicaldreamer/vicuna_to_alpaca
|
59 |
+
|
60 |
+
# Training tool
|
61 |
+
https://github.com/oobabooga/text-generation-webui
|
62 |
+
|
63 |
+
ATM I'm using 2023.05.04v0 of the dataset and training full context.
|
64 |
+
|
65 |
+
# Notes:
|
66 |
+
So I will only be training 1 epoch, as full context 30b takes so long to train.
|
67 |
+
This 1 epoch will take me 8 days lol but luckily these LoRA feels fully functinal at epoch 1 as shown on my 13b one.
|
68 |
+
Also I will be uploading checkpoints almost everyday. I could train another epoch if there's enough want for it.
|
69 |
+
|
70 |
+
Update: Since I will not be training over 1 epoch @Aeala is training for the full 3 https://huggingface.co/Aeala/VicUnlocked-alpaca-half-30b-LoRA but it's half ctx if you care about that. Also @Aeala's just about done.
|
71 |
+
|
72 |
+
Update: Training Finished at Epoch 1, These 8 days sure felt long. I only have one A6000 lads there's only so much I can do. Also RIP gozfarb IDK what happened to him.
|
73 |
+
|
74 |
+
# How to test?
|
75 |
+
1. Download LLaMA-30B-HF if you have not: https://huggingface.co/Neko-Institute-of-Science/LLaMA-30B-HF
|
76 |
+
2. Make a folder called VicUnLocked-30b-LoRA in the loras folder.
|
77 |
+
3. Download adapter_config.json and adapter_model.bin into VicUnLocked-30b-LoRA.
|
78 |
+
4. Load ooba: ```python server.py --listen --model LLaMA-30B-HF --load-in-8bit --chat --lora VicUnLocked-30b-LoRA```
|
79 |
+
5. Select instruct and chose Vicuna-v1.1 template.
|
80 |
+
|
81 |
+
|
82 |
+
# Training Log
|
83 |
+
https://wandb.ai/neko-science/VicUnLocked/runs/vx8yzwi7
|
VicUnlocked-30B-GPTQ-4bit.act-order.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:1c55b158251901afd8671ff738e95913ea38094f84a0e9903d8851799b8ee9d2
|
3 |
+
size 16940128404
|
config.json
ADDED
@@ -0,0 +1,23 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"_name_or_path": "/workspace/models/LLaMA-30B-HF",
|
3 |
+
"architectures": [
|
4 |
+
"LlamaForCausalLM"
|
5 |
+
],
|
6 |
+
"bos_token_id": 1,
|
7 |
+
"eos_token_id": 2,
|
8 |
+
"hidden_act": "silu",
|
9 |
+
"hidden_size": 6656,
|
10 |
+
"initializer_range": 0.02,
|
11 |
+
"intermediate_size": 17920,
|
12 |
+
"max_position_embeddings": 2048,
|
13 |
+
"model_type": "llama",
|
14 |
+
"num_attention_heads": 52,
|
15 |
+
"num_hidden_layers": 60,
|
16 |
+
"pad_token_id": 0,
|
17 |
+
"rms_norm_eps": 1e-06,
|
18 |
+
"tie_word_embeddings": false,
|
19 |
+
"torch_dtype": "float16",
|
20 |
+
"transformers_version": "4.29.2",
|
21 |
+
"use_cache": true,
|
22 |
+
"vocab_size": 32000
|
23 |
+
}
|
generation_config.json
ADDED
@@ -0,0 +1,7 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"_from_model_config": true,
|
3 |
+
"bos_token_id": 1,
|
4 |
+
"eos_token_id": 2,
|
5 |
+
"pad_token_id": 0,
|
6 |
+
"transformers_version": "4.29.2"
|
7 |
+
}
|
special_tokens_map.json
ADDED
@@ -0,0 +1,23 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"bos_token": {
|
3 |
+
"content": "<s>",
|
4 |
+
"lstrip": false,
|
5 |
+
"normalized": true,
|
6 |
+
"rstrip": false,
|
7 |
+
"single_word": false
|
8 |
+
},
|
9 |
+
"eos_token": {
|
10 |
+
"content": "</s>",
|
11 |
+
"lstrip": false,
|
12 |
+
"normalized": true,
|
13 |
+
"rstrip": false,
|
14 |
+
"single_word": false
|
15 |
+
},
|
16 |
+
"unk_token": {
|
17 |
+
"content": "<unk>",
|
18 |
+
"lstrip": false,
|
19 |
+
"normalized": true,
|
20 |
+
"rstrip": false,
|
21 |
+
"single_word": false
|
22 |
+
}
|
23 |
+
}
|
tokenizer.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|
tokenizer.model
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:9e556afd44213b6bd1be2b850ebbbd98f5481437a8021afaf58ee7fb1818d347
|
3 |
+
size 499723
|
tokenizer_config.json
ADDED
@@ -0,0 +1,33 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"add_bos_token": true,
|
3 |
+
"add_eos_token": false,
|
4 |
+
"bos_token": {
|
5 |
+
"__type": "AddedToken",
|
6 |
+
"content": "<s>",
|
7 |
+
"lstrip": false,
|
8 |
+
"normalized": true,
|
9 |
+
"rstrip": false,
|
10 |
+
"single_word": false
|
11 |
+
},
|
12 |
+
"clean_up_tokenization_spaces": false,
|
13 |
+
"eos_token": {
|
14 |
+
"__type": "AddedToken",
|
15 |
+
"content": "</s>",
|
16 |
+
"lstrip": false,
|
17 |
+
"normalized": true,
|
18 |
+
"rstrip": false,
|
19 |
+
"single_word": false
|
20 |
+
},
|
21 |
+
"model_max_length": 1000000000000000019884624838656,
|
22 |
+
"pad_token": null,
|
23 |
+
"sp_model_kwargs": {},
|
24 |
+
"tokenizer_class": "LlamaTokenizer",
|
25 |
+
"unk_token": {
|
26 |
+
"__type": "AddedToken",
|
27 |
+
"content": "<unk>",
|
28 |
+
"lstrip": false,
|
29 |
+
"normalized": true,
|
30 |
+
"rstrip": false,
|
31 |
+
"single_word": false
|
32 |
+
}
|
33 |
+
}
|