flashvenom
/

Airoboros-13B-SuperHOT-8K-4bit-GPTQ

Text Generation

Inference Endpoints

Model card Files Files and versions Community

flashvenom commited on Jun 25, 2023

Commit

68d4589

•

1 Parent(s): f988a72

Update README.md

Files changed (1) hide show

README.md +3 -1

README.md CHANGED Viewed

@@ -1,4 +1,6 @@
-Model upload in 4-bit GPTQ version, converted using GPTQ-for-LLaMa; Source model from https://huggingface.co/Peeepy/Airoboros-13b-SuperHOT-8k.
 You will need a monkey-patch at inference to use the 8k context, please see patch file present, if you are using a different inference engine (like llama.cpp / exllama) you will need to add the monkey patch there.
 ### Note: If you are using exllama the monkey-patch is built into the engine, please use -cpe to set the scaling factor, ie. if you are running it at 4k context, pass `-cpe 2 -l 4096`

+Model upload of Airoboros-13B-SuperHOT in 4-bit GPTQ version, converted using GPTQ-for-LLaMa; Source model from https://huggingface.co/Peeepy/Airoboros-13b-SuperHOT-8k.
+## This uses the Airoboros-13B(v1.2) model and applies the SuperHOT LoRA on top, allowing for improved coherence at larger context lenghts, as well as improving output quality of Airoboros to be more verbose.
 You will need a monkey-patch at inference to use the 8k context, please see patch file present, if you are using a different inference engine (like llama.cpp / exllama) you will need to add the monkey patch there.
 ### Note: If you are using exllama the monkey-patch is built into the engine, please use -cpe to set the scaling factor, ie. if you are running it at 4k context, pass `-cpe 2 -l 4096`