OpenGVLab/InternVL3_5-241B-A28B · Some errors when deploying the lmdeploy api

9 days ago

Hi, thank you for your excellent works
I was trying to deploy the internvl3.5 by lmdeploy, but I met some errors.

First, I create a lmdeploy service by using the command below:

lmdeploy serve api_server OpenGVLab/InternVL3_5-2B --server-port 23333 --tp 1

But I receive an error like below:

Traceback (most recent call last):
  File "/home/xx/miniconda3/envs/lmdeploy/bin/lmdeploy", line 7, in <module>
    sys.exit(run())
  File "/home/xx/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/cli/entrypoint.py", line 39, in run
    args.run(args)
  File "/home/xx/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/cli/serve.py", line 371, in api_server
    run_api_server(args.model_path,
  File "/home/xx/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/serve/openai/api_server.py", line 1352, in serve
    VariableInterface.async_engine = pipeline_class(model_path=model_path,
  File "/home/xx/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/serve/vl_async_engine.py", line 32, in __init__
    super().__init__(model_path, backend=backend, backend_config=backend_config, **kwargs)
  File "/home/xx/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/serve/async_engine.py", line 284, in __init__
    self._build_turbomind(model_path=model_path, backend_config=backend_config, **kwargs)
  File "/home/xx/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/serve/async_engine.py", line 338, in _build_turbomind
    self.engine = tm.TurboMind.from_pretrained(model_path,
  File "/home/xx/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/turbomind/turbomind.py", line 386, in from_pretrained
    return cls(model_path=pretrained_model_name_or_path,
  File "/home/xx/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/turbomind/turbomind.py", line 159, in __init__
    self._load_weights(model_source)
  File "/home/xx/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/turbomind/turbomind.py", line 178, in _load_weights
    self._tm_model.export()
  File "/home/xx/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/turbomind/deploy/target_model/base.py", line 210, in export
    for i, reader in self.input_model.readers():
  File "/home/xx/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/turbomind/deploy/source_model/llama.py", line 113, in readers
    reader = self.Reader(param, {}, False, self.model_config, policy=self.policy)
  File "/home/xx/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/turbomind/deploy/source_model/internvl.py", line 64, in __init__
    raise ValueError(f'Miss "text_config" in model config: {model_cfg}')
ValueError: Miss "text_config" in model config: None

It seems like something is missing in config.json . I've tried lmdeploy 0.9.1/0.9.2/0.9.2.post1, but the result is the same.

banne2266

9 days ago

After checking some other models on huggingface with "architectures": ["Qwen3ForCausalLM"], the llm_config in config.json should be text_config. So I modified the content manually.
After that, the lmdeploy api service final can be started. But when I trying to send some quetion, I can only get garbled text.
I use the example code in readme (I set max_tokens to 256, or it will not stopping generate tokens), and it response like below, :

ChatCompletion(id='2', choices=[Choice(finish_reason='length', index=0, logprobs=None, message=ChatCompletionMessage(content=" unbiased”)_SCDie”) Bush};\n\n\n\n\tcanvas neighbouring festivitiesIMljDie PF PF由 ruggedDie/full.card_SC>';\n\n\tcanvas};\n\n\n\nIM rugged Archeriliki_SC\tcanvas iPad Jeans.card neighbouring iPad Bush neighbouringiliki festivities Bush neighbouring pData(_('_SCIM Archer DataGridViewCellStyle”)由 Jeans};\n\n\n\n>';\n\n.pictureBox rugged_SC acres_SC PFfi neighbouringiliki neighbouring MERCHANTABILITY acreslj由 Bush '/';\n acres acresotate_SC.H.pictureBoxIMIM MERCHANTABILITYSummer.pictureBox由fi festivitiesiliki.card_SCfi};\n\n\n\nSummer ArcherSummer};\n\n\n\n PF/full DataGridViewCellStyleljSummerSummer MERCHANTABILITYfifi.Hiliki ruggedlj Bush iPad iPad Bush.pictureBox DataGridViewCellStyleIM MERCHANTABILITY”)>';\n\n>';\n\n PF PF_SC iPad unbiased iPad rugged pData”)”)IM};\n\n\n\nlj pData\tcanvas.cardotate DataGridViewCellStyle acres iPad neighbouring”)};\n\n\n\n Bush MERCHANTABILITY.H Department neighbouring '/';\n ArcherDiefi(_(' PF Jeans};\n\n\n\n Department acres\tcanvasotate PF rugged”) PFSummer\tcanvas/fulliliki Department.pictureBox PF.cardSummer.card由(_(' Bush.H.card由”) DataGridViewCellStyleiliki\tcanvas\tcanvasiliki MERCHANTABILITYIM PFlj rugged};\n\n\n\n(_('IM neighbouring/full由 MERCHANTABILITYSummer pDataDie.H iPad Bush rugged Bush DataGridViewCellStyle.cardIM};\n\n\n\n Jeansfi festivities neighbouring Archer.HDie Department pData Jeans pData/full iPad\tcanvas\tcanvas DataGridViewCellStyle neighbouring/full '/';\n”) rugged unbiasedfilj unbiasedilikiIM MERCHANTABILITYIMlj Jeans MERCHANTABILITY acres.pictureBox ruggedIMiliki acresfi pData rugged.Hljfi.card};\n\n\n\n/full.H PFfi};\n\n\n\n '/';\n", refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=None, reasoning_content=None))], created=1756191901, model='OpenGVLab/InternVL3_5-2B', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=257, prompt_tokens=1843, total_tokens=2100, completion_tokens_details=None, prompt_tokens_details=None))

If I use the lmdeploy locally, without using the api_service. It can response normally.

zwgao

OpenGVLab org 9 days ago

Have you tried downloading the model to your local machine?

zwgao

OpenGVLab org 9 days ago

Please use the pytorch backend by setting --backend to pytorch .

Weiyun1025

OpenGVLab org 9 days ago

Please use this command to deploy the model. We will update our README accordingly—thank you for your reminder.

lmdeploy serve api_server OpenGVLab/InternVL3_5-2B --server-port 23333 --tp 1 --backend pytorch

banne2266

9 days ago

Very thank you for your reply. The deployment is working.
Would InternVL3.5 support for the TurboMind backend in lmdeploy in the future?

zwgao

OpenGVLab org 8 days ago

lmdeploy will support the TurboMind backend for InternVL3.5 next month.

marmazur

6 days ago

Hi, huge thanks for you monumental work in the multimodal space, truly inspiring :)
I am writing, cause I have an issue running infer of InternVL3_5-241B-A28B on 8xA100 node. I tried both vllm and lmdeploy no luck, everytime I get VRAM OOM after few forwards. Could you maybe share your setup, env, etc. so I could try to reproduce?
Again great work!

Weiyun1025

OpenGVLab org 6 days ago

I am able to successfully deploy it on an 8xH800 node using this command. If you encounter errors during inference, you can try reducing gpu_memory_utilization in vLLM or cache_max_entry_count in LMDeploy, for example, lowering it to 0.3.

lmdeploy serve api_server OpenGVLab/InternVL3_5-241B-A28B --server-port 23333 --tp 8 --backend pytorch

himasai9711

about 6 hours ago

h800 or h100?@Weiyun1025

Weiyun1025

OpenGVLab org about 5 hours ago

h800

OpenGVLab
/

InternVL3_5-241B-A28B

Some errors when deploying the lmdeploy api_server