Update README.md
Browse files
README.md
CHANGED
@@ -2,6 +2,11 @@
|
|
2 |
datasets:
|
3 |
- jondurbin/airoboros-gpt4-1.4.1
|
4 |
---
|
|
|
|
|
|
|
|
|
|
|
5 |
# RoPE Scaled QLoRA Fine-tune of Llama-13b on airoboros-gpt4-1.4.1 (GPTQ)
|
6 |
|
7 |
LoRA Weights can be found here: https://huggingface.co/bhenrym14/airoboros-13b-gpt4-1.4.1-PI-8192-LoRA
|
@@ -16,7 +21,7 @@ This is [Jon Durbin's Airoboros 13B GPT4 1.4](https://huggingface.co/jondurbin/a
|
|
16 |
- Used airoboros-gpt4-1.4.1 dataset instead of airoboros-gpt4-1.4
|
17 |
- **This is a QLoRA fine-tune**. The original 13b model is a full fine-tune.
|
18 |
|
19 |
-
It was trained on 1x RTX 6000 Ada for ~
|
20 |
|
21 |
## How to Use
|
22 |
The easiest way is to use [oobabooga text-generation-webui](https://github.com/oobabooga/text-generation-webui) with ExLlama. You'll need to set max_seq_len to 8192 and compress_pos_emb to 4.
|
@@ -29,15 +34,14 @@ Recent advancements in extending context by RoPE scaling ([kaiokendev](https://k
|
|
29 |
## Relative Performance (perplexity)
|
30 |
| Model | Context (tokens) | Perplexity |
|
31 |
| ---------------------------------------------------- | ----------- | ---------- |
|
32 |
-
| TheBloke/airoboros-13B-gpt4-1-4-SuperHOT-8K-GPTQ | 2048 |
|
33 |
-
| TheBloke/airoboros-13B-gpt4-1-4-SuperHOT-8K-GPTQ | 4096 |
|
34 |
-
|
|
35 |
-
| **bhenrym14/airoboros-13b-gpt4-1.4.1-PI-8192-GPTQ** | **
|
36 |
-
|
37 |
-
|
38 |
-
|
39 |
-
-
|
40 |
-
- This comparison isn't perfect. I did use the 1.4.1 dataset, the quantization method is slightly different, and the finetuning method is different. In short, there are other potentially influential variables responsible for these performance differences.
|
41 |
|
42 |
## Quantization:
|
43 |
|
|
|
2 |
datasets:
|
3 |
- jondurbin/airoboros-gpt4-1.4.1
|
4 |
---
|
5 |
+
|
6 |
+
**7/6: This may be a little undertrained. I'll update the weights if I end up training it longer and/or with better hyperparameters. For now, I'm working on 7b.**
|
7 |
+
|
8 |
+
|
9 |
+
|
10 |
# RoPE Scaled QLoRA Fine-tune of Llama-13b on airoboros-gpt4-1.4.1 (GPTQ)
|
11 |
|
12 |
LoRA Weights can be found here: https://huggingface.co/bhenrym14/airoboros-13b-gpt4-1.4.1-PI-8192-LoRA
|
|
|
21 |
- Used airoboros-gpt4-1.4.1 dataset instead of airoboros-gpt4-1.4
|
22 |
- **This is a QLoRA fine-tune**. The original 13b model is a full fine-tune.
|
23 |
|
24 |
+
It was trained on 1x RTX 6000 Ada for ~17 hours.
|
25 |
|
26 |
## How to Use
|
27 |
The easiest way is to use [oobabooga text-generation-webui](https://github.com/oobabooga/text-generation-webui) with ExLlama. You'll need to set max_seq_len to 8192 and compress_pos_emb to 4.
|
|
|
34 |
## Relative Performance (perplexity)
|
35 |
| Model | Context (tokens) | Perplexity |
|
36 |
| ---------------------------------------------------- | ----------- | ---------- |
|
37 |
+
| TheBloke/airoboros-13B-gpt4-1-4-SuperHOT-8K-GPTQ | 2048 | 5.98 |
|
38 |
+
| TheBloke/airoboros-13B-gpt4-1-4-SuperHOT-8K-GPTQ | 4096 | 5.80 |
|
39 |
+
| **bhenrym14/airoboros-13b-gpt4-1.4.1-PI-8192-GPTQ** | **2048** | **5.28** |
|
40 |
+
| **bhenrym14/airoboros-13b-gpt4-1.4.1-PI-8192-GPTQ** | **4096** | **5.15** |
|
41 |
+
|
42 |
+
|
43 |
+
- How does this reduction in perplexity translate into actual performance lift on downstream tasks? I haven't used models with the SuperHOT LoRA enough to have any sense of performance differences, but feedback on the 33b variant suggests it is noticable particularly with coherence at longer context lengths.
|
44 |
+
- This comparison isn't perfect. I did use the 1.4.1 dataset, the quantization method is slightly different, and the finetuning method is different (QLoRA vs full). In short, there are other potentially influential variables responsible for these performance differences.
|
|
|
45 |
|
46 |
## Quantization:
|
47 |
|