TheDrummer/Moistral-11B-v3 · Vram question:

Granted, I'm a newb and if I could readily find this answer, I wouldn't ask... but usually I can load gptq 13b models, and 7bx2 transformer models fine~
But with this 11B model takes up a ton of VRAM. Also, people say you can load up to 30b models on 24gb of vram, but I've never been able to stably run past 14b. Am I alone here with some sort of memory leak or what? I'm even running the webserver in qute to save vram, but I can't send too many tokens to OobaBooga with Moistral V3 or it CUDA Memory Error pops. Forget about running XTTSv2 or 3d/live2d characters~