n_ctx strange size

by 010O11 - opened Jan 9, 2024

Jan 9, 2024

•

edited Jan 9, 2024

when I normally use 32K context, it gives me >>> n_ctx 32848 = 6247.16 MiB
but in this model >>> llama_new_context_with_model: total VRAM used: 38378.45 MiB (model: 10055.54 MiB, context: 28322.91 MiB) [Q_6_K TheBloke quant]

ddh0

Owner Jan 9, 2024

when you normally use 32k context, is that with a 7B mistral-based model?

i believe more parameters --> more memory for the same amount of context. i may be wrong

010O11

Jan 9, 2024

•

edited Jan 9, 2024

yeah the 'normally' data are from 7B models, is that huge difference possible? Sry than, I wasn't aware, I thought it's somehow strangely too big.....

...........7B.Q8_0.GGUF n_ctx 32848 = 6247.16 MiB
4x7B.-Q4_K_M.GGUF n_ctx 32848 = 6275.18 MiB

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment