bhenrym14
/

airoboros-13b-gpt4-1.4.1-PI-8192-GPTQ

Text Generation

Transformers

llama

Inference Endpoints

Model card Files Files and versions Community

bhenrym14 commited on Jul 6, 2023

Commit

58718ab

1 Parent(s): 147af85

Update README.md

Browse files

Files changed (1) hide show

README.md +14 -10

README.md CHANGED Viewed

@@ -2,6 +2,11 @@
 datasets:
 - jondurbin/airoboros-gpt4-1.4.1
 ---
 # RoPE Scaled QLoRA Fine-tune of Llama-13b on airoboros-gpt4-1.4.1 (GPTQ)
 LoRA Weights can be found here: https://huggingface.co/bhenrym14/airoboros-13b-gpt4-1.4.1-PI-8192-LoRA
@@ -16,7 +21,7 @@ This is [Jon Durbin's Airoboros 13B GPT4 1.4](https://huggingface.co/jondurbin/a
 - Used airoboros-gpt4-1.4.1 dataset instead of airoboros-gpt4-1.4
 - **This is a QLoRA fine-tune**. The original 13b model is a full fine-tune.
-It was trained on 1x RTX 6000 Ada for ~18 hours.
 ## How to Use
 The easiest way is to use [oobabooga text-generation-webui](https://github.com/oobabooga/text-generation-webui) with ExLlama. You'll need to set max_seq_len to 8192 and compress_pos_emb to 4.
@@ -29,15 +34,14 @@ Recent advancements in extending context by RoPE scaling ([kaiokendev](https://k
 ## Relative Performance (perplexity)
 | Model                                                | Context (tokens)     | Perplexity |
 | ---------------------------------------------------- | ----------- | ---------- |
-| TheBloke/airoboros-13B-gpt4-1-4-SuperHOT-8K-GPTQ     | 2048        |        |
-| TheBloke/airoboros-13B-gpt4-1-4-SuperHOT-8K-GPTQ     | 4096        |        |
-| TheBloke/airoboros-13B-gpt4-1-4-SuperHOT-8K-GPTQ     | 8192        |        |
-| **bhenrym14/airoboros-13b-gpt4-1.4.1-PI-8192-GPTQ**    | **2048**    | ****   |
-| **bhenrym14/airoboros-13b-gpt4-1.4.1-PI-8192-GPTQ**    | **4096**    | ****   |
-| **bhenrym14/airoboros-13b-gpt4-1.4.1-PI-8192-GPTQ**    | **8192**    | ****   |
-- How does this reduction in perplexity translate into actual performance lift on downstream tasks? I'm not sure yet. I've done a few experiments and have been happy with the performance, but I haven't used models with the SuperHOT LoRA enough to have any sense of performance differences.
-- This comparison isn't perfect. I did use the 1.4.1 dataset, the quantization method is slightly different, and the finetuning method is different. In short, there are other potentially influential variables responsible for these performance differences.
 ## Quantization:

 datasets:
 - jondurbin/airoboros-gpt4-1.4.1
 ---
+**7/6: This may be a little undertrained. I'll update the weights if I end up training it longer and/or with better hyperparameters. For now, I'm working on 7b.**
 # RoPE Scaled QLoRA Fine-tune of Llama-13b on airoboros-gpt4-1.4.1 (GPTQ)
 LoRA Weights can be found here: https://huggingface.co/bhenrym14/airoboros-13b-gpt4-1.4.1-PI-8192-LoRA
 - Used airoboros-gpt4-1.4.1 dataset instead of airoboros-gpt4-1.4
 - **This is a QLoRA fine-tune**. The original 13b model is a full fine-tune.
+It was trained on 1x RTX 6000 Ada for ~17 hours.
 ## How to Use
 The easiest way is to use [oobabooga text-generation-webui](https://github.com/oobabooga/text-generation-webui) with ExLlama. You'll need to set max_seq_len to 8192 and compress_pos_emb to 4.
 ## Relative Performance (perplexity)
 | Model                                                | Context (tokens)     | Perplexity |
 | ---------------------------------------------------- | ----------- | ---------- |
+| TheBloke/airoboros-13B-gpt4-1-4-SuperHOT-8K-GPTQ     | 2048        |    5.98    |
+| TheBloke/airoboros-13B-gpt4-1-4-SuperHOT-8K-GPTQ     | 4096        |    5.80    |
+| **bhenrym14/airoboros-13b-gpt4-1.4.1-PI-8192-GPTQ**    | **2048**    | **5.28**   |
+| **bhenrym14/airoboros-13b-gpt4-1.4.1-PI-8192-GPTQ**    | **4096**    | **5.15**   |
+- How does this reduction in perplexity translate into actual performance lift on downstream tasks? I haven't used models with the SuperHOT LoRA enough to have any sense of performance differences, but feedback on the 33b variant suggests it is noticable particularly with coherence at longer context lengths.
+- This comparison isn't perfect. I did use the 1.4.1 dataset, the quantization method is slightly different, and the finetuning method is different (QLoRA vs full). In short, there are other potentially influential variables responsible for these performance differences.
 ## Quantization: