slow

#4
by ehartford - opened
MLX Community org

I am trying to use this on m3 max.
It's so slow that it's unusable. (0.51 token/second)
Is it possible to make it any faster?

Can't compare it with llama.cpp because, it doesn't work there yet.

MLX Community org

Sign up or log in to comment