Notes

  • 05/02/26: IQ3_S quant coming a bit later, quantization crashed so need to redo it.

Model

This repo contains specialized MoE-quants for zai-org/GLM-5.1. The idea being that given the huge size of the FFN tensors compared to the rest of the tensors in the model, it should be possible to achieve a better quality while keeping the overall size of the entire model smaller compared to a similar naive quantization. To that end, the quantization type default is kept in high quality and the FFN UP + FFN GATE tensors are quanted down along with the FFN DOWN tensors.

Quant Size Mixture PPL 1-(Mean PPL(Q)/PPL(base)) KLD
Q5_K_M 520.08 GiB (5.93 BPW) Q8_0 / Q5_K / Q5_K / Q6_K 2.732420 ± 0.015015 +0.3411% 0.020247 ± 0.000173
Q4_K_M 432.80 GiB (4.93 BPW) Q8_0 / Q4_K / Q4_K / Q5_K 2.754593 ± 0.015142 +1.1553% 0.037406 ± 0.000308
IQ4_XS 336.61 GiB (3.84 BPW) Q8_0 / IQ3_S / IQ3_S / IQ4_XS 2.892748 ± 0.015981 +6.2287% 0.099818 ± 0.000754
IQ3_S 259.89 GiB (2.96 BPW) Q6_K / IQ2_S / IQ2_S / IQ3_S 3.282336 ± 0.018782 +20.5353% 0.262398 ± 0.001686

kld_graph ppl_graph

Downloads last month
898
GGUF
Model size
754B params
Architecture
glm-dsa
Hardware compatibility
Log In to add your hardware

3-bit

4-bit

5-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AesSedai/GLM-5.1-GGUF

Base model

zai-org/GLM-5.1
Quantized
(38)
this model