slow
#4
by
ehartford
- opened
I am trying to use this on m3 max.
It's so slow that it's unusable. (0.51 token/second)
Is it possible to make it any faster?
Can't compare it with llama.cpp because, it doesn't work there yet.
Yes, it's possible to make it faster. You can read more here:
https://huggingface.co/mlx-community/c4ai-command-r-plus-4bit/discussions/2#6613dfdebf1904adf1ef89b9