VRAM requirements
#43
by
otacilio-psf
- opened
Hi, it's not clear to me how much VRAM I need to run this model, as it have 6.6B of active parameters it should fit in 24 GB of VRAM, or I'm wrong?
I have tried using vLLM.
Last question, is possible to change the number of experts?
Thanks for your interest!
MoE still needs to load all the parameters. So, you need memory to load 42B parameters.
It will save computation by using only 6.6B active parameters at inference.
nguyenbh
changed discussion status to
closed