Edit model card
YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

3bit quantized version of this: https://huggingface.co/ausboss/llama-30b-supercot

GPTQ quantization using https://github.com/0cc4m/GPTQ-for-LLaMa

Made at the request of someone that wanted a 3bit version. The file is 17% smaller than 4bit non-groupsize, but the wikitext2 ppl is 12% worse. I don't have a functioning Ooba install so I can't test this myself.

Command used to quantize:
python llama.py c:\llama-30b-supercot c4 --wbits 3 --true-sequential --groupsize 128 --save_safetensors 4bit-128g.safetensors

Evaluation & Score (Lower is better):

  • WikiText2: 5.22 (12% worse than 4bit non-groupsize)
  • PTB: 19.63 (11% worse than 4bit non-groupsize)
  • C4: 6.93 (7% worse than 4bit non-groupsize)

4bit non-groupsize version is here: https://huggingface.co/tsumeone/llama-30b-supercot-4bit-cuda

4bit 128 groupsize version is here: https://huggingface.co/tsumeone/llama-30b-supercot-4bit-128g-cuda

Downloads last month
782
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.