flashvenom
commited on
Commit
•
68d4589
1
Parent(s):
f988a72
Update README.md
Browse files
README.md
CHANGED
@@ -1,4 +1,6 @@
|
|
1 |
-
|
|
|
|
|
2 |
|
3 |
You will need a monkey-patch at inference to use the 8k context, please see patch file present, if you are using a different inference engine (like llama.cpp / exllama) you will need to add the monkey patch there.
|
4 |
### Note: If you are using exllama the monkey-patch is built into the engine, please use -cpe to set the scaling factor, ie. if you are running it at 4k context, pass `-cpe 2 -l 4096`
|
|
|
1 |
+
|
2 |
+
Model upload of Airoboros-13B-SuperHOT in 4-bit GPTQ version, converted using GPTQ-for-LLaMa; Source model from https://huggingface.co/Peeepy/Airoboros-13b-SuperHOT-8k.
|
3 |
+
## This uses the Airoboros-13B(v1.2) model and applies the SuperHOT LoRA on top, allowing for improved coherence at larger context lenghts, as well as improving output quality of Airoboros to be more verbose.
|
4 |
|
5 |
You will need a monkey-patch at inference to use the 8k context, please see patch file present, if you are using a different inference engine (like llama.cpp / exllama) you will need to add the monkey patch there.
|
6 |
### Note: If you are using exllama the monkey-patch is built into the engine, please use -cpe to set the scaling factor, ie. if you are running it at 4k context, pass `-cpe 2 -l 4096`
|