Beinsezii
/

GLM-4.5-Air-Q4F-Q8A-Q8SH-GGUF

Model card Files Files and versions

Q4F : Q4_K feed-forawrd (Q5_1 for ffn_down due to shape constraints)
Q8A : Q8_0 attention, Q8_0 output, Q8_0 embeds
Q8SH : Q8_0 shared experts

Readable speeds on a 24GiB GPU + 64GB RAM

Downloads last month: 94

GGUF

Model size

110B params

Architecture

glm4moe

Hardware compatibility

Log In to view the estimation

We're not able to determine the quantization variants.

View all variants

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support