What is the required GPU size to run Is a 4090 possible and does it support ollama

#5
by sminbb - opened

What is the required GPU size to run
Is a 4090 possible and does it support ollama

4090 should be good enough. Yes ollama would be helpful since these are GGUF files. However you will have to import GGUF in ollama.

Unsloth AI org

What is the required GPU size to run
Is a 4090 possible and does it support ollama

Yes 4090 is enough. You don't need a GPU, CPU with 48GB RAM will be enough.

At the moment Ollama does not support it as far as I'm aware of so you will need to use llama.cpp

Wait...you're telling me if I have say a

AMD Ryzen 7 5700X 8-Core Processor
Thread(s) per core: 2
Core(s) per socket: 8

RTX 4090

64GB DDR4 Ram

That I could run a Deepseek V3 Quant?

Unsloth AI org

Wait...you're telling me if I have say a

AMD Ryzen 7 5700X 8-Core Processor
Thread(s) per core: 2
Core(s) per socket: 8

RTX 4090

64GB DDR4 Ram

That I could run a Deepseek V3 Quant?

Yes that is correct but it will probably be slow

But you need enough RAM to load the model in memory, NO ????

Unsloth AI org

But you need enough RAM to load the model in memory, NO ????

Nope you actually don't but it will be slow. With a GPU and offloading it will be faster

you means if I have mac mini with 64GB, is that enough to run this model? And when you slow how much slowness we are saying here assuming the mac mini as example?

Unsloth AI org

you means if I have mac mini with 64GB, is that enough to run this model? And when you slow how much slowness we are saying here assuming the mac mini as example?

yes, you can def run the model if u use the 2bit. for slowness, you might 1.5 token or less per second.

In some discord channel (I think LM Studio), it was extensively discussed and everyone told that it needs that much RAM for sure to load the model and then for inference it needs less memory as its MOE. But this is quite interesting that it doesn't need that much RAM to even load. But if complete model is not even loaded, how it decides which params to activate during inference. (I am little newbie, so might be confused and asking these questions )

Sign up or log in to comment