2024-07-10 18:05:34 | INFO | model_worker | args: Namespace(host='0.0.0.0', port=40004, worker_address='http://10.140.60.25:40004', controller_address='http://10.140.60.209:10075', model_path='share_internvl/InternVL2-8B/', model_name=None, device='cuda', limit_model_concurrency=5, stream_interval=1, load_8bit=False) 2024-07-10 18:05:34 | INFO | model_worker | Loading the model InternVL2-8B on worker 6223f6 ... 2024-07-10 18:05:35 | WARNING | transformers.tokenization_utils_base | Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. 2024-07-10 18:05:35 | WARNING | transformers.tokenization_utils_base | Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. 2024-07-10 18:05:36 | ERROR | stderr | /mnt/petrelfs/wangweiyun/miniconda3/envs/internvl-apex/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:397: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `None` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed. 2024-07-10 18:05:36 | ERROR | stderr | warnings.warn( 2024-07-10 18:05:36 | ERROR | stderr | Loading checkpoint shards: 0%| | 0/4 [00:00