• Q4F : Q4_K feed-forawrd (Q5_1 for ffn_down due to shape constraints)
  • Q8A : Q8_0 attention, Q8_0 output, Q8_0 embeds
  • Q8SH : Q8_0 shared experts

Readable speeds on a 24GiB GPU + 64GB RAM

Downloads last month
94
GGUF
Model size
110B params
Architecture
glm4moe
Hardware compatibility
Log In to view the estimation

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support