n_ctx strange size
#3
by
010O11
- opened
when I normally use 32K context, it gives me >>> n_ctx 32848 = 6247.16 MiB
but in this model >>> llama_new_context_with_model: total VRAM used: 38378.45 MiB (model: 10055.54 MiB, context: 28322.91 MiB) [Q_6_K TheBloke quant]
when you normally use 32k context, is that with a 7B mistral-based model?
i believe more parameters --> more memory for the same amount of context. i may be wrong
yeah the 'normally' data are from 7B models, is that huge difference possible? Sry than, I wasn't aware, I thought it's somehow strangely too big.....
...........7B.Q8_0.GGUF n_ctx 32848 = 6247.16 MiB
4x7B.-Q4_K_M.GGUF n_ctx 32848 = 6275.18 MiB