VRAM requirements

#43
by otacilio-psf - opened

Hi, it's not clear to me how much VRAM I need to run this model, as it have 6.6B of active parameters it should fit in 24 GB of VRAM, or I'm wrong?

I have tried using vLLM.

Last question, is possible to change the number of experts?

Microsoft org

Thanks for your interest!

MoE still needs to load all the parameters. So, you need memory to load 42B parameters.

It will save computation by using only 6.6B active parameters at inference.

nguyenbh changed discussion status to closed

Sign up or log in to comment