Passing --max-model-len expected to work
#7
by
bakbeest
- opened
Is --max-model-len
still expected to work with vLLM when using --tokenizer_mode mistral --config_format mistral --load_format mistral
, it seems not as I am OOM'ing with sizes that I should be able to run.
I'm just a GPU poor trying to run this AWQ quant on 4x4090 and can't load full context length. Can run the model with a decent length if I don't use the mistral flags though, but then tool calling won't work.