grimulkan
/

aurelian-alpha0.1-70b-rope8-32K-2.4bpw_h6_exl2

Text Generation

Transformers

Safetensors

llama

text-generation-inference

Model card Files Files and versions

xet

Community

This is a 2.4-bit quantization of Aurelian v0.1alpha 70B 32K for testing & feedback. See that page for more details.

This quantization fits in a single 24GB using Exllamav2 & 8-bit cache @ 10K context. It uses the newer experimental quantization method from turboderp.

Downloads last month: -