- Q4F : Q4_K feed-forawrd (Q5_1 for ffn_down due to shape constraints)
- Q8A : Q8_0 attention, Q8_0 output, Q8_0 embeds
- Q8SH : Q8_0 shared experts
Readable speeds on a 24GiB GPU + 64GB RAM
- Downloads last month
- 94
Hardware compatibility
Log In
to view the estimation
We're not able to determine the quantization variants.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support