2024-07-10 18:08:24 | INFO | model_worker | args: Namespace(host='0.0.0.0', port=40007, worker_address='http://10.140.60.182:40007', controller_address='http://10.140.60.209:10075', model_path='share_internvl/InternVL2-40B/', model_name=None, device='auto', limit_model_concurrency=5, stream_interval=1, load_8bit=False) 2024-07-10 18:08:24 | INFO | model_worker | Loading the model InternVL2-40B on worker 7c887a ... 2024-07-10 18:08:24 | WARNING | transformers.tokenization_utils_base | Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. 2024-07-10 18:08:24 | WARNING | transformers.tokenization_utils_base | Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. 2024-07-10 18:08:30 | ERROR | stderr | /mnt/petrelfs/wangweiyun/miniconda3/envs/internvl/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:397: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `None` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed. 2024-07-10 18:08:30 | ERROR | stderr | warnings.warn( 2024-07-10 18:08:33 | ERROR | stderr | Loading checkpoint shards: 0%| | 0/17 [00:00