Why does Deepseek 67B have a bigger file size than 70B models?

#1
by OrangeApples - opened

@LoneStriker I'm comparing your 2.65bpw quants of Deepseek 67B and Euryale 1.4 L2 70B, and Deepseek's is bigger by about 700 MBs. However, when I checked TheBloke's GGUF quants of both, the opposite was true which is what I expected. Not really a big deal, but I am curious about what caused this.

I'm not sure actually, that's something we would need to ask the author of Exllamav2 for his insight. I have anecdotally noticed, however that Deepseek seems to take more VRAM vs. L2 70B under exl2 as well. So the size is reflected in the VRAM usage. As to why? I would just guess the tokenizer being 100K vs. 32K.

Sign up or log in to comment