L3-8B-Lunaris-v1-exl2-6_5 strugles when the context goes past 8k
I have a 3090, so it shouldnt be an issue with the vram but when using silly tavern and a lore book for our guild i quickly hit 9k context, at that point the repsonses glitch and you get a ton of random or repeated text.
both silly tavern and textgen web ui are set for 32k context.
Is the model card for this correct? at 8k and bellow its a rather amazing model, so i wonder if the max context is actually 8k?
Llama 3 was only trained up to 8k, if the author of this model claims it goes further that's surprising, but I'd hope they trained it enough to support it but maybe not..
On the model card for this model it shoes vram usage for (32k) for all the different versions of this model. I assumed the 32k was the context?
Am I reading it wrong? Or is that part of the model page autogenerated and not actually showing it as supported?
It's only recently I have had a need for a context higher than 8k so it's possible I am just nit understanding the model cards correctly.
Oh that's just autogenerated and doesn't imply the model is capable of that, sorry!
You can try out dolphin's mistral tune that has 32k support: https://huggingface.co/bartowski/dolphin-2.9.3-mistral-7B-32k-exl2
Np, I will bear that in mind when looking at models lol
Will take a look at the dolphin model tomorrow, I think I have used it before... though not that specific varient. looks a lower vram usage too, interesting. Cheers for the replies, and sorry for my confusion.