Some errors when deploying the lmdeploy api_server

#2
by banne2266 - opened

Hi, thank you for your excellent works
I was trying to deploy the internvl3.5 by lmdeploy, but I met some errors.

First, I create a lmdeploy service by using the command below:

lmdeploy serve api_server OpenGVLab/InternVL3_5-2B --server-port 23333 --tp 1

But I receive an error like below:

Traceback (most recent call last):
  File "/home/xx/miniconda3/envs/lmdeploy/bin/lmdeploy", line 7, in <module>
    sys.exit(run())
  File "/home/xx/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/cli/entrypoint.py", line 39, in run
    args.run(args)
  File "/home/xx/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/cli/serve.py", line 371, in api_server
    run_api_server(args.model_path,
  File "/home/xx/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/serve/openai/api_server.py", line 1352, in serve
    VariableInterface.async_engine = pipeline_class(model_path=model_path,
  File "/home/xx/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/serve/vl_async_engine.py", line 32, in __init__
    super().__init__(model_path, backend=backend, backend_config=backend_config, **kwargs)
  File "/home/xx/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/serve/async_engine.py", line 284, in __init__
    self._build_turbomind(model_path=model_path, backend_config=backend_config, **kwargs)
  File "/home/xx/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/serve/async_engine.py", line 338, in _build_turbomind
    self.engine = tm.TurboMind.from_pretrained(model_path,
  File "/home/xx/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/turbomind/turbomind.py", line 386, in from_pretrained
    return cls(model_path=pretrained_model_name_or_path,
  File "/home/xx/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/turbomind/turbomind.py", line 159, in __init__
    self._load_weights(model_source)
  File "/home/xx/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/turbomind/turbomind.py", line 178, in _load_weights
    self._tm_model.export()
  File "/home/xx/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/turbomind/deploy/target_model/base.py", line 210, in export
    for i, reader in self.input_model.readers():
  File "/home/xx/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/turbomind/deploy/source_model/llama.py", line 113, in readers
    reader = self.Reader(param, {}, False, self.model_config, policy=self.policy)
  File "/home/xx/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/turbomind/deploy/source_model/internvl.py", line 64, in __init__
    raise ValueError(f'Miss "text_config" in model config: {model_cfg}')
ValueError: Miss "text_config" in model config: None

It seems like something is missing in config.json . I've tried lmdeploy 0.9.1/0.9.2/0.9.2.post1, but the result is the same.

After checking some other models on huggingface with "architectures": ["Qwen3ForCausalLM"], the llm_config in config.json should be text_config. So I modified the content manually.
After that, the lmdeploy api service final can be started. But when I trying to send some quetion, I can only get garbled text.
I use the example code in readme (I set max_tokens to 256, or it will not stopping generate tokens), and it response like below, :

ChatCompletion(id='2', choices=[Choice(finish_reason='length', index=0, logprobs=None, message=ChatCompletionMessage(content=" unbiased”)_SCDie”) Bush};\n\n\n\n\tcanvas neighbouring festivitiesIMljDie PF PF由 ruggedDie/full.card_SC>';\n\n\tcanvas};\n\n\n\nIM rugged Archeriliki_SC\tcanvas iPad Jeans.card neighbouring iPad Bush neighbouringiliki festivities Bush neighbouring pData(_('_SCIM Archer DataGridViewCellStyle”)由 Jeans};\n\n\n\n>';\n\n.pictureBox rugged_SC acres_SC PFfi neighbouringiliki neighbouring MERCHANTABILITY acreslj由 Bush '/';\n acres acresotate_SC.H.pictureBoxIMIM MERCHANTABILITYSummer.pictureBox由fi festivitiesiliki.card_SCfi};\n\n\n\nSummer ArcherSummer};\n\n\n\n PF/full DataGridViewCellStyleljSummerSummer MERCHANTABILITYfifi.Hiliki ruggedlj Bush iPad iPad Bush.pictureBox DataGridViewCellStyleIM MERCHANTABILITY”)>';\n\n>';\n\n PF PF_SC iPad unbiased iPad rugged pData”)”)IM};\n\n\n\nlj pData\tcanvas.cardotate DataGridViewCellStyle acres iPad neighbouring”)};\n\n\n\n Bush MERCHANTABILITY.H Department neighbouring '/';\n ArcherDiefi(_(' PF Jeans};\n\n\n\n Department acres\tcanvasotate PF rugged”) PFSummer\tcanvas/fulliliki Department.pictureBox PF.cardSummer.card由(_(' Bush.H.card由”) DataGridViewCellStyleiliki\tcanvas\tcanvasiliki MERCHANTABILITYIM PFlj rugged};\n\n\n\n(_('IM neighbouring/full由 MERCHANTABILITYSummer pDataDie.H iPad Bush rugged Bush DataGridViewCellStyle.cardIM};\n\n\n\n Jeansfi festivities neighbouring Archer.HDie Department pData Jeans pData/full iPad\tcanvas\tcanvas DataGridViewCellStyle neighbouring/full '/';\n”) rugged unbiasedfilj unbiasedilikiIM MERCHANTABILITYIMlj Jeans MERCHANTABILITY acres.pictureBox ruggedIMiliki acresfi pData rugged.Hljfi.card};\n\n\n\n/full.H PFfi};\n\n\n\n '/';\n", refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=None, reasoning_content=None))], created=1756191901, model='OpenGVLab/InternVL3_5-2B', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=257, prompt_tokens=1843, total_tokens=2100, completion_tokens_details=None, prompt_tokens_details=None))

If I use the lmdeploy locally, without using the api_service. It can response normally.

OpenGVLab org

Have you tried downloading the model to your local machine?

OpenGVLab org

Please use the pytorch backend by setting --backend to pytorch .

OpenGVLab org

Please use this command to deploy the model. We will update our README accordingly—thank you for your reminder.

lmdeploy serve api_server OpenGVLab/InternVL3_5-2B --server-port 23333 --tp 1 --backend pytorch

Very thank you for your reply. The deployment is working.
Would InternVL3.5 support for the TurboMind backend in lmdeploy in the future?

OpenGVLab org

lmdeploy will support the TurboMind backend for InternVL3.5 next month.

Hi, huge thanks for you monumental work in the multimodal space, truly inspiring :)
I am writing, cause I have an issue running infer of InternVL3_5-241B-A28B on 8xA100 node. I tried both vllm and lmdeploy no luck, everytime I get VRAM OOM after few forwards. Could you maybe share your setup, env, etc. so I could try to reproduce?
Again great work!

OpenGVLab org

I am able to successfully deploy it on an 8xH800 node using this command. If you encounter errors during inference, you can try reducing gpu_memory_utilization in vLLM or cache_max_entry_count in LMDeploy, for example, lowering it to 0.3.

lmdeploy serve api_server OpenGVLab/InternVL3_5-241B-A28B --server-port 23333 --tp 8 --backend pytorch

h800 or h100?@Weiyun1025

h800

Sign up or log in to comment