elinas
/

alpaca-30b-lora-int4

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

elinas commited on Mar 29, 2023

Commit

1c6c3af

·

1 Parent(s): 022b275

info on models

Files changed (1) hide show

README.md +23 -3

README.md CHANGED Viewed

@@ -9,13 +9,18 @@ https://github.com/qwopqwop200/GPTQ-for-LLaMa
 LoRA credit to https://huggingface.co/baseten/alpaca-30b
 # Update 2023-03-27
-New weights have been added. The old .pt version is no longer supported and has been replaced by a 128 groupsize safetensors file. Update to the latest GPTQ to use it.
-**alpaca-30b-4bit-128g.safetensors**
-Evals
 -----
 **c4-new** -
 6.398105144500732
@@ -25,6 +30,21 @@ Evals
 **wikitext2** -
 4.402845859527588
 # Usage
 1. Run manually through GPTQ
 2. (More setup but better UI) - Use the [text-generation-webui](https://github.com/oobabooga/text-generation-webui/wiki/LLaMA-model#4-bit-mode). Make sure to follow the installation steps first [here](https://github.com/oobabooga/text-generation-webui#installation) before adding GPTQ support.

 LoRA credit to https://huggingface.co/baseten/alpaca-30b
+# Update 2023-03-29
+There is also a non-groupsize quantized model that is 1GB smaller in size, which should allow running at max context tokens with 24GB VRAM. The evaluations are better
+on the 128 groupsize version, but the tradeoff is not being able to run it at full context without offloading or a GPU with more VRAM.
 # Update 2023-03-27
+New weights have been added. The old .pt version is no longer supported and has been replaced by a 128 groupsize safetensors file. Update to the latest GPTQ version/webui.
+Evals - Groupsize 128 + True Sequential
 -----
+**alpaca-30b-4bit-128g.safetensors** [4805cc2]
 **c4-new** -
 6.398105144500732
 **wikitext2** -
 4.402845859527588
+Evals - Default + True Sequential
+-----
+**alpaca-30b-4bit.safetensors** [6958004]
+**c4-new** -
+6.592941761016846
+**ptb-new** -
+8.718379974365234
+**wikitext2** -
+4.635514736175537
 # Usage
 1. Run manually through GPTQ
 2. (More setup but better UI) - Use the [text-generation-webui](https://github.com/oobabooga/text-generation-webui/wiki/LLaMA-model#4-bit-mode). Make sure to follow the installation steps first [here](https://github.com/oobabooga/text-generation-webui#installation) before adding GPTQ support.