File size: 10,770 Bytes
3f1b7f0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
2024-07-11 22:32:29 | INFO | model_worker | args: Namespace(host='0.0.0.0', port=40007, worker_address='http://10.140.66.196:40007', controller_address='http://10.140.60.209:10075', model_path='share_internvl/InternVL2-78B/', model_name=None, device='auto', limit_model_concurrency=5, stream_interval=1, load_8bit=False)
2024-07-11 22:32:29 | INFO | model_worker | Loading the model InternVL2-78B on worker 03bfe8 ...
2024-07-11 22:32:29 | WARNING | transformers.tokenization_utils_base | Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
2024-07-11 22:32:29 | WARNING | transformers.tokenization_utils_base | Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
2024-07-11 22:32:33 | ERROR | stderr | 
Loading checkpoint shards:   0%|          | 0/33 [00:00<?, ?it/s]
2024-07-11 22:32:36 | ERROR | stderr | 
Loading checkpoint shards:   3%|▎         | 1/33 [00:02<01:28,  2.78s/it]
2024-07-11 22:32:38 | ERROR | stderr | 
Loading checkpoint shards:   6%|▌         | 2/33 [00:04<01:10,  2.26s/it]
2024-07-11 22:32:40 | ERROR | stderr | 
Loading checkpoint shards:   9%|▉         | 3/33 [00:06<01:02,  2.09s/it]
2024-07-11 22:32:42 | ERROR | stderr | 
Loading checkpoint shards:  12%|█▏        | 4/33 [00:08<00:59,  2.05s/it]
2024-07-11 22:32:44 | ERROR | stderr | 
Loading checkpoint shards:  15%|█▌        | 5/33 [00:10<00:56,  2.02s/it]
2024-07-11 22:32:46 | ERROR | stderr | 
Loading checkpoint shards:  18%|█▊        | 6/33 [00:12<00:54,  2.00s/it]
2024-07-11 22:32:48 | ERROR | stderr | 
Loading checkpoint shards:  21%|██        | 7/33 [00:14<00:56,  2.16s/it]
2024-07-11 22:32:50 | ERROR | stderr | 
Loading checkpoint shards:  24%|██▍       | 8/33 [00:16<00:52,  2.11s/it]
2024-07-11 22:32:53 | ERROR | stderr | 
Loading checkpoint shards:  27%|██▋       | 9/33 [00:19<00:55,  2.30s/it]
2024-07-11 22:32:55 | ERROR | stderr | 
Loading checkpoint shards:  30%|███       | 10/33 [00:21<00:50,  2.19s/it]
2024-07-11 22:32:57 | ERROR | stderr | 
Loading checkpoint shards:  33%|███▎      | 11/33 [00:23<00:46,  2.10s/it]
2024-07-11 22:32:59 | ERROR | stderr | 
Loading checkpoint shards:  36%|███▋      | 12/33 [00:25<00:42,  2.01s/it]
2024-07-11 22:33:01 | ERROR | stderr | 
Loading checkpoint shards:  39%|███▉      | 13/33 [00:27<00:39,  1.96s/it]
2024-07-11 22:33:03 | ERROR | stderr | 
Loading checkpoint shards:  42%|████▏     | 14/33 [00:29<00:36,  1.94s/it]
2024-07-11 22:33:04 | ERROR | stderr | 
Loading checkpoint shards:  45%|████▌     | 15/33 [00:30<00:34,  1.92s/it]
2024-07-11 22:33:06 | ERROR | stderr | 
Loading checkpoint shards:  48%|████▊     | 16/33 [00:32<00:32,  1.91s/it]
2024-07-11 22:33:08 | ERROR | stderr | 
Loading checkpoint shards:  52%|█████▏    | 17/33 [00:34<00:30,  1.89s/it]
2024-07-11 22:33:10 | ERROR | stderr | 
Loading checkpoint shards:  55%|█████▍    | 18/33 [00:36<00:28,  1.90s/it]
2024-07-11 22:33:12 | ERROR | stderr | 
Loading checkpoint shards:  58%|█████▊    | 19/33 [00:38<00:26,  1.88s/it]
2024-07-11 22:33:14 | ERROR | stderr | 
Loading checkpoint shards:  61%|██████    | 20/33 [00:40<00:24,  1.88s/it]
2024-07-11 22:33:16 | ERROR | stderr | 
Loading checkpoint shards:  64%|██████▎   | 21/33 [00:42<00:22,  1.88s/it]
2024-07-11 22:33:18 | ERROR | stderr | 
Loading checkpoint shards:  67%|██████▋   | 22/33 [00:44<00:20,  1.90s/it]
2024-07-11 22:33:20 | ERROR | stderr | 
Loading checkpoint shards:  70%|██████▉   | 23/33 [00:46<00:19,  1.91s/it]
2024-07-11 22:33:22 | ERROR | stderr | 
Loading checkpoint shards:  73%|███████▎  | 24/33 [00:48<00:17,  1.96s/it]
2024-07-11 22:33:24 | ERROR | stderr | 
Loading checkpoint shards:  76%|███████▌  | 25/33 [00:50<00:15,  1.99s/it]
2024-07-11 22:33:26 | ERROR | stderr | 
Loading checkpoint shards:  79%|███████▉  | 26/33 [00:52<00:13,  1.99s/it]
2024-07-11 22:33:28 | ERROR | stderr | 
Loading checkpoint shards:  82%|████████▏ | 27/33 [00:54<00:11,  1.95s/it]
2024-07-11 22:33:29 | ERROR | stderr | 
Loading checkpoint shards:  85%|████████▍ | 28/33 [00:55<00:09,  1.93s/it]
2024-07-11 22:33:31 | ERROR | stderr | 
Loading checkpoint shards:  88%|████████▊ | 29/33 [00:57<00:07,  1.94s/it]
2024-07-11 22:33:33 | ERROR | stderr | 
Loading checkpoint shards:  91%|█████████ | 30/33 [01:00<00:05,  1.99s/it]
2024-07-11 22:33:36 | ERROR | stderr | 
Loading checkpoint shards:  94%|█████████▍| 31/33 [01:02<00:04,  2.00s/it]
2024-07-11 22:33:37 | ERROR | stderr | 
Loading checkpoint shards:  97%|█████████▋| 32/33 [01:03<00:01,  1.93s/it]
2024-07-11 22:33:39 | ERROR | stderr | 
Loading checkpoint shards: 100%|██████████| 33/33 [01:05<00:00,  1.74s/it]
2024-07-11 22:33:39 | ERROR | stderr | 
Loading checkpoint shards: 100%|██████████| 33/33 [01:05<00:00,  1.97s/it]
2024-07-11 22:33:39 | ERROR | stderr | 
2024-07-11 22:33:40 | INFO | model_worker | Register to controller
2024-07-11 22:33:40 | ERROR | stderr | INFO:     Started server process [41841]
2024-07-11 22:33:40 | ERROR | stderr | INFO:     Waiting for application startup.
2024-07-11 22:33:40 | ERROR | stderr | INFO:     Application startup complete.
2024-07-11 22:33:40 | ERROR | stderr | INFO:     Uvicorn running on http://0.0.0.0:40007 (Press CTRL+C to quit)
2024-07-11 22:33:55 | INFO | model_worker | Send heart beat. Models: ['InternVL2-78B']. Semaphore: None. global_counter: 0
2024-07-11 22:34:02 | INFO | stdout | INFO:     10.140.60.209:44754 - "POST /worker_get_status HTTP/1.1" 200 OK
2024-07-11 22:34:04 | INFO | stdout | INFO:     10.140.60.209:44774 - "POST /worker_get_status HTTP/1.1" 200 OK
2024-07-11 22:34:05 | INFO | stdout | INFO:     10.140.60.209:44794 - "POST /worker_get_status HTTP/1.1" 200 OK
2024-07-11 22:34:05 | INFO | model_worker | Send heart beat. Models: ['InternVL2-78B']. Semaphore: Semaphore(value=4, locked=False). global_counter: 1
2024-07-11 22:34:05 | INFO | stdout | INFO:     10.140.60.209:44802 - "POST /worker_generate_stream HTTP/1.1" 200 OK
2024-07-11 22:34:05 | INFO | model_worker | max_input_tile_list: [12]
2024-07-11 22:34:06 | INFO | model_worker | Split images to torch.Size([13, 3, 448, 448])
2024-07-11 22:34:06 | INFO | model_worker | []
2024-07-11 22:34:06 | INFO | model_worker | Generation config: {'num_beams': 1, 'max_new_tokens': 2048, 'do_sample': True, 'temperature': 0.8, 'repetition_penalty': 1.1, 'max_length': 8192, 'top_p': 0.7, 'streamer': <transformers.generation.streamers.TextIteratorStreamer object at 0x7fca140d51e0>}
2024-07-11 22:34:07 | WARNING | transformers.generation.utils | Setting `pad_token_id` to `eos_token_id`:151645 for open-end generation.
2024-07-11 22:34:07 | WARNING | transformers.generation.utils | Both `max_new_tokens` (=2048) and `max_length`(=8192) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
2024-07-11 22:34:10 | INFO | model_worker | Send heart beat. Models: ['InternVL2-78B']. Semaphore: Semaphore(value=4, locked=False). global_counter: 1
2024-07-11 22:34:13 | ERROR | stderr | Exception in thread Thread-3 (chat):
2024-07-11 22:34:13 | ERROR | stderr | Traceback (most recent call last):
2024-07-11 22:34:13 | ERROR | stderr |   File "/mnt/petrelfs/wangweiyun/miniconda3/envs/internvl-apex/lib/python3.10/threading.py", line 1009, in _bootstrap_inner
2024-07-11 22:34:13 | ERROR | stderr |     self.run()
2024-07-11 22:34:13 | ERROR | stderr |   File "/mnt/petrelfs/wangweiyun/miniconda3/envs/internvl-apex/lib/python3.10/threading.py", line 946, in run
2024-07-11 22:34:13 | ERROR | stderr |     self._target(*self._args, **self._kwargs)
2024-07-11 22:34:13 | ERROR | stderr |   File "/mnt/petrelfs/wangweiyun/.cache/huggingface/modules/transformers_modules/InternVL2-78B/modeling_internvl_chat.py", line 283, in chat
2024-07-11 22:34:13 | ERROR | stderr |     generation_output = self.generate(
2024-07-11 22:34:13 | ERROR | stderr |   File "/mnt/petrelfs/wangweiyun/miniconda3/envs/internvl-apex/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
2024-07-11 22:34:13 | ERROR | stderr |     return func(*args, **kwargs)
2024-07-11 22:34:13 | ERROR | stderr |   File "/mnt/petrelfs/wangweiyun/.cache/huggingface/modules/transformers_modules/InternVL2-78B/modeling_internvl_chat.py", line 333, in generate
2024-07-11 22:34:13 | ERROR | stderr |     outputs = self.language_model.generate(
2024-07-11 22:34:13 | ERROR | stderr |   File "/mnt/petrelfs/wangweiyun/miniconda3/envs/internvl-apex/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
2024-07-11 22:34:13 | ERROR | stderr |     return func(*args, **kwargs)
2024-07-11 22:34:13 | ERROR | stderr |   File "/mnt/petrelfs/wangweiyun/miniconda3/envs/internvl-apex/lib/python3.10/site-packages/transformers/generation/utils.py", line 1525, in generate
2024-07-11 22:34:13 | ERROR | stderr |     return self.sample(
2024-07-11 22:34:13 | ERROR | stderr |   File "/mnt/petrelfs/wangweiyun/miniconda3/envs/internvl-apex/lib/python3.10/site-packages/transformers/generation/utils.py", line 2641, in sample
2024-07-11 22:34:13 | ERROR | stderr |     next_token_scores = logits_processor(input_ids, next_token_logits)
2024-07-11 22:34:13 | ERROR | stderr |   File "/mnt/petrelfs/wangweiyun/miniconda3/envs/internvl-apex/lib/python3.10/site-packages/transformers/generation/logits_process.py", line 97, in __call__
2024-07-11 22:34:13 | ERROR | stderr |     scores = processor(input_ids, scores)
2024-07-11 22:34:13 | ERROR | stderr |   File "/mnt/petrelfs/wangweiyun/miniconda3/envs/internvl-apex/lib/python3.10/site-packages/transformers/generation/logits_process.py", line 333, in __call__
2024-07-11 22:34:13 | ERROR | stderr |     score = torch.gather(scores, 1, input_ids)
2024-07-11 22:34:13 | ERROR | stderr | RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:4 and cuda:0! (when checking argument for argument index in method wrapper_CUDA_gather)
2024-07-11 22:34:16 | INFO | model_worker | Send heart beat. Models: ['InternVL2-78B']. Semaphore: Semaphore(value=5, locked=False). global_counter: 1
2024-07-11 22:34:25 | INFO | model_worker | Send heart beat. Models: ['InternVL2-78B']. Semaphore: Semaphore(value=5, locked=False). global_counter: 1