Update README.md
Browse files
README.md
CHANGED
@@ -11,7 +11,7 @@ datasets:
|
|
11 |
|
12 |
|
13 |
<!-- LoRA Weights can be found here: https://huggingface.co/bhenrym14/airophin-13b-pntk-16k-LoRA -->
|
14 |
-
|
15 |
|
16 |
## Overview
|
17 |
|
@@ -28,7 +28,7 @@ The finetune was performed with 1x RTX 6000 Ada.
|
|
28 |
|
29 |
This model employs linear RoPE scaling, which is now has native support in Transformers (be sure to update it if you have issues). Use it as you would with any normal context length variant.
|
30 |
|
31 |
-
Please comment with any questions.
|
32 |
|
33 |
Ooba use: Be sure to increase the `Truncate the prompt up to this length` parameter to 8192 to utilize the full context capabilities.
|
34 |
|
@@ -48,7 +48,7 @@ Previous experiments have demonstrated that orca-like datasets yield substantial
|
|
48 |
| 12000 | 55 | **4.82** | 56.1 | Not Tested | Not Tested |
|
49 |
|
50 |
- This model is very competitive with the Llama-1 33b extended context variants. In fact, it outperforms bhenrym14/airoboros-33b-gpt4-1.4.1-lxctx-PI-16384-fp16 everywhere <=8192 tokens. Do note however that 33b model is only trained on the 1.4.1 Airoboros dataset. Additionally this model only requires a PI factor of 2, whereas the 33b-16k llama1 model requires a factor of 8. It is clear from my experiments and those in the literature that higher factors pose larger challenges for performance recovery.
|
51 |
-
- Not presented here, but this model outperforms the base llama-2-13b on MMLU-fs with a score of
|
52 |
- Feedback regarding real-world performance is appreciated. Llama2-13b is known to have repetition problems. Does the extensive training on top of the base model help ameliorate this tendency? Perplexity and MMLU are great, but the don't tell the whole story.
|
53 |
|
54 |
## Prompting:
|
|
|
11 |
|
12 |
|
13 |
<!-- LoRA Weights can be found here: https://huggingface.co/bhenrym14/airophin-13b-pntk-16k-LoRA -->
|
14 |
+
GPTQ weights can be found here: https://huggingface.co/bhenrym14/airophin-v2-13b-PI-8k-GPTQ
|
15 |
|
16 |
## Overview
|
17 |
|
|
|
28 |
|
29 |
This model employs linear RoPE scaling, which is now has native support in Transformers (be sure to update it if you have issues). Use it as you would with any normal context length variant.
|
30 |
|
31 |
+
Please comment with any questions. The GPTQ version can be found [here](https://huggingface.co/bhenrym14/airophin-v2-13b-PI-8k-fp16). I may upload a GGML version soon, especially if anyone expresses interest.
|
32 |
|
33 |
Ooba use: Be sure to increase the `Truncate the prompt up to this length` parameter to 8192 to utilize the full context capabilities.
|
34 |
|
|
|
48 |
| 12000 | 55 | **4.82** | 56.1 | Not Tested | Not Tested |
|
49 |
|
50 |
- This model is very competitive with the Llama-1 33b extended context variants. In fact, it outperforms bhenrym14/airoboros-33b-gpt4-1.4.1-lxctx-PI-16384-fp16 everywhere <=8192 tokens. Do note however that 33b model is only trained on the 1.4.1 Airoboros dataset. Additionally this model only requires a PI factor of 2, whereas the 33b-16k llama1 model requires a factor of 8. It is clear from my experiments and those in the literature that higher factors pose larger challenges for performance recovery.
|
51 |
+
- Not presented here, but this model outperforms the base llama-2-13b on MMLU-fs with a score of ~57.3 (computed on subset of full benchmark). If this score ends up being be replicated on the HF LLM leaderboard, **this would be the highest mmlu score for a 13b extended context model** and #4 overall for 13b (as of 8/15).
|
52 |
- Feedback regarding real-world performance is appreciated. Llama2-13b is known to have repetition problems. Does the extensive training on top of the base model help ameliorate this tendency? Perplexity and MMLU are great, but the don't tell the whole story.
|
53 |
|
54 |
## Prompting:
|