Some errors when deploying the lmdeploy api_server
Hi, thank you for your excellent works
I was trying to deploy the internvl3.5 by lmdeploy, but I met some errors.
First, I create a lmdeploy service by using the command below:
lmdeploy serve api_server OpenGVLab/InternVL3_5-2B --server-port 23333 --tp 1
But I receive an error like below:
Traceback (most recent call last):
File "/home/xx/miniconda3/envs/lmdeploy/bin/lmdeploy", line 7, in <module>
sys.exit(run())
File "/home/xx/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/cli/entrypoint.py", line 39, in run
args.run(args)
File "/home/xx/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/cli/serve.py", line 371, in api_server
run_api_server(args.model_path,
File "/home/xx/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/serve/openai/api_server.py", line 1352, in serve
VariableInterface.async_engine = pipeline_class(model_path=model_path,
File "/home/xx/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/serve/vl_async_engine.py", line 32, in __init__
super().__init__(model_path, backend=backend, backend_config=backend_config, **kwargs)
File "/home/xx/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/serve/async_engine.py", line 284, in __init__
self._build_turbomind(model_path=model_path, backend_config=backend_config, **kwargs)
File "/home/xx/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/serve/async_engine.py", line 338, in _build_turbomind
self.engine = tm.TurboMind.from_pretrained(model_path,
File "/home/xx/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/turbomind/turbomind.py", line 386, in from_pretrained
return cls(model_path=pretrained_model_name_or_path,
File "/home/xx/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/turbomind/turbomind.py", line 159, in __init__
self._load_weights(model_source)
File "/home/xx/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/turbomind/turbomind.py", line 178, in _load_weights
self._tm_model.export()
File "/home/xx/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/turbomind/deploy/target_model/base.py", line 210, in export
for i, reader in self.input_model.readers():
File "/home/xx/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/turbomind/deploy/source_model/llama.py", line 113, in readers
reader = self.Reader(param, {}, False, self.model_config, policy=self.policy)
File "/home/xx/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/turbomind/deploy/source_model/internvl.py", line 64, in __init__
raise ValueError(f'Miss "text_config" in model config: {model_cfg}')
ValueError: Miss "text_config" in model config: None
It seems like something is missing in config.json
. I've tried lmdeploy 0.9.1/0.9.2/0.9.2.post1, but the result is the same.
After checking some other models on huggingface with "architectures": ["Qwen3ForCausalLM"],
the llm_config
in config.json
should be text_config
. So I modified the content manually.
After that, the lmdeploy api service final can be started. But when I trying to send some quetion, I can only get garbled text.
I use the example code in readme (I set max_tokens to 256, or it will not stopping generate tokens), and it response like below, :
ChatCompletion(id='2', choices=[Choice(finish_reason='length', index=0, logprobs=None, message=ChatCompletionMessage(content=" unbiased”)_SCDie”) Bush};\n\n\n\n\tcanvas neighbouring festivitiesIMljDie PF PF由 ruggedDie/full.card_SC>';\n\n\tcanvas};\n\n\n\nIM rugged Archeriliki_SC\tcanvas iPad Jeans.card neighbouring iPad Bush neighbouringiliki festivities Bush neighbouring pData(_('_SCIM Archer DataGridViewCellStyle”)由 Jeans};\n\n\n\n>';\n\n.pictureBox rugged_SC acres_SC PFfi neighbouringiliki neighbouring MERCHANTABILITY acreslj由 Bush '/';\n acres acresotate_SC.H.pictureBoxIMIM MERCHANTABILITYSummer.pictureBox由fi festivitiesiliki.card_SCfi};\n\n\n\nSummer ArcherSummer};\n\n\n\n PF/full DataGridViewCellStyleljSummerSummer MERCHANTABILITYfifi.Hiliki ruggedlj Bush iPad iPad Bush.pictureBox DataGridViewCellStyleIM MERCHANTABILITY”)>';\n\n>';\n\n PF PF_SC iPad unbiased iPad rugged pData”)”)IM};\n\n\n\nlj pData\tcanvas.cardotate DataGridViewCellStyle acres iPad neighbouring”)};\n\n\n\n Bush MERCHANTABILITY.H Department neighbouring '/';\n ArcherDiefi(_(' PF Jeans};\n\n\n\n Department acres\tcanvasotate PF rugged”) PFSummer\tcanvas/fulliliki Department.pictureBox PF.cardSummer.card由(_(' Bush.H.card由”) DataGridViewCellStyleiliki\tcanvas\tcanvasiliki MERCHANTABILITYIM PFlj rugged};\n\n\n\n(_('IM neighbouring/full由 MERCHANTABILITYSummer pDataDie.H iPad Bush rugged Bush DataGridViewCellStyle.cardIM};\n\n\n\n Jeansfi festivities neighbouring Archer.HDie Department pData Jeans pData/full iPad\tcanvas\tcanvas DataGridViewCellStyle neighbouring/full '/';\n”) rugged unbiasedfilj unbiasedilikiIM MERCHANTABILITYIMlj Jeans MERCHANTABILITY acres.pictureBox ruggedIMiliki acresfi pData rugged.Hljfi.card};\n\n\n\n/full.H PFfi};\n\n\n\n '/';\n", refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=None, reasoning_content=None))], created=1756191901, model='OpenGVLab/InternVL3_5-2B', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=257, prompt_tokens=1843, total_tokens=2100, completion_tokens_details=None, prompt_tokens_details=None))
If I use the lmdeploy locally, without using the api_service. It can response normally.
Have you tried downloading the model to your local machine?
Please use the pytorch backend by setting --backend to pytorch .
Please use this command to deploy the model. We will update our README accordingly—thank you for your reminder.
lmdeploy serve api_server OpenGVLab/InternVL3_5-2B --server-port 23333 --tp 1 --backend pytorch
Very thank you for your reply. The deployment is working.
Would InternVL3.5 support for the TurboMind backend in lmdeploy in the future?
lmdeploy will support the TurboMind backend for InternVL3.5 next month.
Hi, huge thanks for you monumental work in the multimodal space, truly inspiring :)
I am writing, cause I have an issue running infer of InternVL3_5-241B-A28B
on 8xA100 node. I tried both vllm
and lmdeploy
no luck, everytime I get VRAM OOM after few forwards. Could you maybe share your setup, env, etc. so I could try to reproduce?
Again great work!
I am able to successfully deploy it on an 8xH800 node using this command. If you encounter errors during inference, you can try reducing gpu_memory_utilization
in vLLM or cache_max_entry_count
in LMDeploy, for example, lowering it to 0.3.
lmdeploy serve api_server OpenGVLab/InternVL3_5-241B-A28B --server-port 23333 --tp 8 --backend pytorch
h800 or h100?@Weiyun1025
h800