2024-07-10 18:02:50 | INFO | model_worker | args: Namespace(host='0.0.0.0', port=40006, worker_address='http://10.140.60.25:40006', controller_address='http://10.140.60.209:10075', model_path='share_internvl/InternVL2-40B/', model_name=None, device='auto', limit_model_concurrency=5, stream_interval=1, load_8bit=False) 2024-07-10 18:02:50 | INFO | model_worker | Loading the model InternVL2-40B on worker 30a4c1 ... 2024-07-10 18:02:50 | WARNING | transformers.tokenization_utils_base | Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. 2024-07-10 18:02:50 | WARNING | transformers.tokenization_utils_base | Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. 2024-07-10 18:02:52 | ERROR | stderr | /mnt/petrelfs/wangweiyun/miniconda3/envs/internvl/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:397: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `None` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed. 2024-07-10 18:02:52 | ERROR | stderr | warnings.warn( 2024-07-10 18:02:56 | ERROR | stderr | Loading checkpoint shards: 0%| | 0/17 [00:00