Configurations/Hardware specs for loading this model with vllm?
I'm having trouble loading this 7B model with vllm, but with the same hardware (T4 15GB) I have no issue loading TheBloke/13B-Thorns-L2-AWQ.
Is there something I can do with the configuration?
TheBloke/WestLake-7B-v2-AWQ, could it be the max_seq_len=32768?
INFO 01-26 13:02:18 llm_engine.py:70] Initializing an LLM engine with config: model='TheBloke/WestLake-7B-v2-AWQ', tokenizer='TheBloke/WestLake-7B-v2-AWQ', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=32768, download_dir=None, load_format=auto, tensor_parallel_size=1, quantization=awq, enforce_eager=False, seed=0)
TheBloke/13B-Thorns-L2-AWQ
INFO 01-26 13:03:42 llm_engine.py:70] Initializing an LLM engine with config: model='TheBloke/13B-Thorns-L2-AWQ', tokenizer='TheBloke/13B-Thorns-L2-AWQ', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=4096, download_dir=None, load_format=auto, tensor_parallel_size=1, quantization=awq, enforce_eager=False, seed=0)
@wezfaas
yeah it’s probably the max seq Len u put. Make it to 4096 as well.
Higher context=more vram