how
How did you do that? I understand using gguf with llama.cpp and control vectors. But how did you make the exl2 quant? Can you share your recipe?
It's pretty hacky python code I wrote tbh. And it's annoying me. ie, I load up a Mistral-Large with the sociopathy vector set high, all the characters are psychotic, so when I want a different character personality, I have to swap to reload the model for scenes with that character in them, then switch back to the sociopathy one. At this point, might as well just fine tune the model and create Loras.
The only one I use consistently now is a Mistral-Large with the language -> simple vector turned up, where I just want it to convert things to stories with bland beige prose.
You're better off using llamacpp + control vectors, where at least you don't need 5 copies of the same model, and changing the vectors loads faster since the model is already cached in system ram.