W1205 13:38:14.913000 139671368247104 torch/distributed/run.py:779] W1205 13:38:14.913000 139671368247104 torch/distributed/run.py:779] ***************************************** W1205 13:38:14.913000 139671368247104 torch/distributed/run.py:779] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. W1205 13:38:14.913000 139671368247104 torch/distributed/run.py:779] ***************************************** /cpfs01/user/wuzhenyu/anaconda3/envs/llama-factory/lib/python3.11/site-packages/_distutils_hack/__init__.py:55: UserWarning: Reliance on distutils from stdlib is deprecated. Users must rely on setuptools to provide the distutils module. Avoid importing distutils or import setuptools first, and avoid setting SETUPTOOLS_USE_DISTUTILS=stdlib. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml warnings.warn( /cpfs01/user/wuzhenyu/anaconda3/envs/llama-factory/lib/python3.11/site-packages/_distutils_hack/__init__.py:55: UserWarning: Reliance on distutils from stdlib is deprecated. Users must rely on setuptools to provide the distutils module. Avoid importing distutils or import setuptools first, and avoid setting SETUPTOOLS_USE_DISTUTILS=stdlib. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml warnings.warn( /cpfs01/user/wuzhenyu/anaconda3/envs/llama-factory/lib/python3.11/site-packages/_distutils_hack/__init__.py:55: UserWarning: Reliance on distutils from stdlib is deprecated. Users must rely on setuptools to provide the distutils module. Avoid importing distutils or import setuptools first, and avoid setting SETUPTOOLS_USE_DISTUTILS=stdlib. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml warnings.warn( /cpfs01/user/wuzhenyu/anaconda3/envs/llama-factory/lib/python3.11/site-packages/_distutils_hack/__init__.py:55: UserWarning: Reliance on distutils from stdlib is deprecated. Users must rely on setuptools to provide the distutils module. Avoid importing distutils or import setuptools first, and avoid setting SETUPTOOLS_USE_DISTUTILS=stdlib. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml warnings.warn( /cpfs01/user/wuzhenyu/anaconda3/envs/llama-factory/lib/python3.11/site-packages/_distutils_hack/__init__.py:55: UserWarning: Reliance on distutils from stdlib is deprecated. Users must rely on setuptools to provide the distutils module. Avoid importing distutils or import setuptools first, and avoid setting SETUPTOOLS_USE_DISTUTILS=stdlib. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml warnings.warn( /cpfs01/user/wuzhenyu/anaconda3/envs/llama-factory/lib/python3.11/site-packages/_distutils_hack/__init__.py:55: UserWarning: Reliance on distutils from stdlib is deprecated. Users must rely on setuptools to provide the distutils module. Avoid importing distutils or import setuptools first, and avoid setting SETUPTOOLS_USE_DISTUTILS=stdlib. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml warnings.warn( /cpfs01/user/wuzhenyu/anaconda3/envs/llama-factory/lib/python3.11/site-packages/_distutils_hack/__init__.py:55: UserWarning: Reliance on distutils from stdlib is deprecated. Users must rely on setuptools to provide the distutils module. Avoid importing distutils or import setuptools first, and avoid setting SETUPTOOLS_USE_DISTUTILS=stdlib. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml warnings.warn( /cpfs01/user/wuzhenyu/anaconda3/envs/llama-factory/lib/python3.11/site-packages/_distutils_hack/__init__.py:55: UserWarning: Reliance on distutils from stdlib is deprecated. Users must rely on setuptools to provide the distutils module. Avoid importing distutils or import setuptools first, and avoid setting SETUPTOOLS_USE_DISTUTILS=stdlib. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml warnings.warn( [2024-12-05 13:38:39,939] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-12-05 13:38:39,939] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-12-05 13:38:39,939] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-12-05 13:38:39,939] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-12-05 13:38:39,939] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-12-05 13:38:39,939] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-12-05 13:38:39,939] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-12-05 13:38:39,939] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect) df: df: /root/.triton/autotune/root/.triton/autotunedf: /root/.triton/autotune: 没有那个文件或目录 : 没有那个文件或目录 : 没有那个文件或目录 df: /root/.triton/autotune: 没有那个文件或目录 df: /root/.triton/autotune: 没有那个文件或目录 /cpfs01/user/wuzhenyu/anaconda3/envs/llama-factory/lib/python3.11/site-packages/transformers/training_args.py:1733: FutureWarning: Using `--dispatch_batches` is deprecated and will be removed in version 4.41 of 🤗 Transformers. Use `--accelerator_config {'dispatch_batches':VALUE} instead warnings.warn( /cpfs01/user/wuzhenyu/anaconda3/envs/llama-factory/lib/python3.11/site-packages/transformers/training_args.py:1733: FutureWarning: Using `--dispatch_batches` is deprecated and will be removed in version 4.41 of 🤗 Transformers. Use `--accelerator_config {'dispatch_batches':VALUE} instead warnings.warn( [2024-12-05 13:38:47,217] [INFO] [comm.py:652:init_distributed] cdb=None [2024-12-05 13:38:47,217] [INFO] [comm.py:683:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl [2024-12-05 13:38:47,218] [INFO] [comm.py:652:init_distributed] cdb=None /cpfs01/user/wuzhenyu/anaconda3/envs/llama-factory/lib/python3.11/site-packages/transformers/training_args.py:1733: FutureWarning: Using `--dispatch_batches` is deprecated and will be removed in version 4.41 of 🤗 Transformers. Use `--accelerator_config {'dispatch_batches':VALUE} instead warnings.warn( /cpfs01/user/wuzhenyu/anaconda3/envs/llama-factory/lib/python3.11/site-packages/transformers/training_args.py:1733: FutureWarning: Using `--dispatch_batches` is deprecated and will be removed in version 4.41 of 🤗 Transformers. Use `--accelerator_config {'dispatch_batches':VALUE} instead warnings.warn( /cpfs01/user/wuzhenyu/anaconda3/envs/llama-factory/lib/python3.11/site-packages/transformers/training_args.py:1733: FutureWarning: Using `--dispatch_batches` is deprecated and will be removed in version 4.41 of 🤗 Transformers. Use `--accelerator_config {'dispatch_batches':VALUE} instead warnings.warn( /cpfs01/user/wuzhenyu/anaconda3/envs/llama-factory/lib/python3.11/site-packages/transformers/training_args.py:1733: FutureWarning: Using `--dispatch_batches` is deprecated and will be removed in version 4.41 of 🤗 Transformers. Use `--accelerator_config {'dispatch_batches':VALUE} instead warnings.warn( /cpfs01/user/wuzhenyu/anaconda3/envs/llama-factory/lib/python3.11/site-packages/transformers/training_args.py:1733: FutureWarning: Using `--dispatch_batches` is deprecated and will be removed in version 4.41 of 🤗 Transformers. Use `--accelerator_config {'dispatch_batches':VALUE} instead warnings.warn( [2024-12-05 13:38:47,229] [INFO] [comm.py:652:init_distributed] cdb=None [2024-12-05 13:38:47,230] [INFO] [comm.py:652:init_distributed] cdb=None [2024-12-05 13:38:47,231] [INFO] [comm.py:652:init_distributed] cdb=None [2024-12-05 13:38:47,231] [INFO] [comm.py:652:init_distributed] cdb=None [2024-12-05 13:38:47,231] [INFO] [comm.py:652:init_distributed] cdb=None /cpfs01/user/wuzhenyu/anaconda3/envs/llama-factory/lib/python3.11/site-packages/transformers/training_args.py:1733: FutureWarning: Using `--dispatch_batches` is deprecated and will be removed in version 4.41 of 🤗 Transformers. Use `--accelerator_config {'dispatch_batches':VALUE} instead warnings.warn( [2024-12-05 13:38:47,237] [INFO] [comm.py:652:init_distributed] cdb=None 12/05/2024 13:38:47 - INFO - llamafactory.hparams.parser - Process rank: 0, device: cuda:0, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16 [INFO|configuration_utils.py:670] 2024-12-05 13:38:48,151 >> loading configuration file /nas/shared/NLP_A100/wuzhenyu/LLMs/Qwen2-VL-7B-Instruct/config.json [WARNING|modeling_rope_utils.py:379] 2024-12-05 13:38:48,155 >> Unrecognized keys in `rope_scaling` for 'rope_type'='default': {'mrope_section'} [INFO|configuration_utils.py:739] 2024-12-05 13:38:48,156 >> Model config Qwen2VLConfig { "_name_or_path": "/nas/shared/NLP_A100/wuzhenyu/LLMs/Qwen2-VL-7B-Instruct", "architectures": [ "Qwen2VLForConditionalGeneration" ], "attention_dropout": 0.0, "bos_token_id": 151643, "eos_token_id": 151645, "hidden_act": "silu", "hidden_size": 3584, "image_token_id": 151655, "initializer_range": 0.02, "intermediate_size": 18944, "max_position_embeddings": 32768, "max_window_layers": 28, "model_type": "qwen2_vl", "num_attention_heads": 28, "num_hidden_layers": 28, "num_key_value_heads": 4, "rms_norm_eps": 1e-06, "rope_scaling": { "mrope_section": [ 16, 24, 24 ], "rope_type": "default", "type": "default" }, "rope_theta": 1000000.0, "sliding_window": 32768, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.45.0", "use_cache": true, "use_sliding_window": false, "video_token_id": 151656, "vision_config": { "in_chans": 3, "model_type": "qwen2_vl", "spatial_patch_size": 14 }, "vision_end_token_id": 151653, "vision_start_token_id": 151652, "vision_token_id": 151654, "vocab_size": 152064 } [INFO|tokenization_utils_base.py:2212] 2024-12-05 13:38:48,288 >> loading file vocab.json [INFO|tokenization_utils_base.py:2212] 2024-12-05 13:38:48,288 >> loading file merges.txt [INFO|tokenization_utils_base.py:2212] 2024-12-05 13:38:48,288 >> loading file tokenizer.json [INFO|tokenization_utils_base.py:2212] 2024-12-05 13:38:48,288 >> loading file added_tokens.json [INFO|tokenization_utils_base.py:2212] 2024-12-05 13:38:48,288 >> loading file special_tokens_map.json [INFO|tokenization_utils_base.py:2212] 2024-12-05 13:38:48,288 >> loading file tokenizer_config.json [INFO|tokenization_utils_base.py:2478] 2024-12-05 13:38:48,699 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [INFO|image_processing_base.py:373] 2024-12-05 13:38:48,771 >> loading configuration file /nas/shared/NLP_A100/wuzhenyu/LLMs/Qwen2-VL-7B-Instruct/preprocessor_config.json [INFO|image_processing_base.py:373] 2024-12-05 13:38:48,793 >> loading configuration file /nas/shared/NLP_A100/wuzhenyu/LLMs/Qwen2-VL-7B-Instruct/preprocessor_config.json [INFO|image_processing_base.py:429] 2024-12-05 13:38:48,793 >> Image processor Qwen2VLImageProcessor { "do_convert_rgb": true, "do_normalize": true, "do_rescale": true, "do_resize": true, "image_mean": [ 0.48145466, 0.4578275, 0.40821073 ], "image_processor_type": "Qwen2VLImageProcessor", "image_std": [ 0.26862954, 0.26130258, 0.27577711 ], "max_pixels": 12845056, "merge_size": 2, "min_pixels": 3136, "patch_size": 14, "processor_class": "Qwen2VLProcessor", "resample": 3, "rescale_factor": 0.00392156862745098, "size": { "max_pixels": 12845056, "min_pixels": 3136 }, "temporal_patch_size": 2 } [INFO|tokenization_utils_base.py:2212] 2024-12-05 13:38:48,805 >> loading file vocab.json [INFO|tokenization_utils_base.py:2212] 2024-12-05 13:38:48,806 >> loading file merges.txt [INFO|tokenization_utils_base.py:2212] 2024-12-05 13:38:48,806 >> loading file tokenizer.json [INFO|tokenization_utils_base.py:2212] 2024-12-05 13:38:48,806 >> loading file added_tokens.json [INFO|tokenization_utils_base.py:2212] 2024-12-05 13:38:48,806 >> loading file special_tokens_map.json [INFO|tokenization_utils_base.py:2212] 2024-12-05 13:38:48,806 >> loading file tokenizer_config.json [INFO|tokenization_utils_base.py:2478] 2024-12-05 13:38:49,114 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [INFO|processing_utils.py:744] 2024-12-05 13:38:49,684 >> Processor Qwen2VLProcessor: - image_processor: Qwen2VLImageProcessor { "do_convert_rgb": true, "do_normalize": true, "do_rescale": true, "do_resize": true, "image_mean": [ 0.48145466, 0.4578275, 0.40821073 ], "image_processor_type": "Qwen2VLImageProcessor", "image_std": [ 0.26862954, 0.26130258, 0.27577711 ], "max_pixels": 12845056, "merge_size": 2, "min_pixels": 3136, "patch_size": 14, "processor_class": "Qwen2VLProcessor", "resample": 3, "rescale_factor": 0.00392156862745098, "size": { "max_pixels": 12845056, "min_pixels": 3136 }, "temporal_patch_size": 2 } - tokenizer: Qwen2TokenizerFast(name_or_path='/nas/shared/NLP_A100/wuzhenyu/LLMs/Qwen2-VL-7B-Instruct', vocab_size=151643, model_max_length=32768, is_fast=True, padding_side='left', truncation_side='right', special_tokens={'eos_token': '<|im_end|>', 'pad_token': '<|endoftext|>', 'additional_special_tokens': ['<|im_start|>', '<|im_end|>', '<|object_ref_start|>', '<|object_ref_end|>', '<|box_start|>', '<|box_end|>', '<|quad_start|>', '<|quad_end|>', '<|vision_start|>', '<|vision_end|>', '<|vision_pad|>', '<|image_pad|>', '<|video_pad|>']}, clean_up_tokenization_spaces=False), added_tokens_decoder={ 151643: AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151644: AddedToken("<|im_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151645: AddedToken("<|im_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151646: AddedToken("<|object_ref_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151647: AddedToken("<|object_ref_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151648: AddedToken("<|box_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151649: AddedToken("<|box_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151650: AddedToken("<|quad_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151651: AddedToken("<|quad_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151652: AddedToken("<|vision_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151653: AddedToken("<|vision_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151654: AddedToken("<|vision_pad|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151655: AddedToken("<|image_pad|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151656: AddedToken("<|video_pad|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), } { "processor_class": "Qwen2VLProcessor" } 12/05/2024 13:38:49 - INFO - llamafactory.data.template - Replace eos token: <|im_end|> 12/05/2024 13:38:49 - INFO - llamafactory.data.loader - Loading dataset sim/1205/1205_aw_24k.jsonl... 12/05/2024 13:38:50 - INFO - llamafactory.hparams.parser - Process rank: 6, device: cuda:6, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16 12/05/2024 13:38:50 - INFO - llamafactory.hparams.parser - Process rank: 2, device: cuda:2, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16 12/05/2024 13:38:50 - INFO - llamafactory.hparams.parser - Process rank: 3, device: cuda:3, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16 Unrecognized keys in `rope_scaling` for 'rope_type'='default': {'mrope_section'} Unrecognized keys in `rope_scaling` for 'rope_type'='default': {'mrope_section'} Unrecognized keys in `rope_scaling` for 'rope_type'='default': {'mrope_section'} 12/05/2024 13:38:50 - INFO - llamafactory.hparams.parser - Process rank: 4, device: cuda:4, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16 12/05/2024 13:38:50 - INFO - llamafactory.hparams.parser - Process rank: 7, device: cuda:7, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16 12/05/2024 13:38:50 - INFO - llamafactory.hparams.parser - Process rank: 1, device: cuda:1, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16 Unrecognized keys in `rope_scaling` for 'rope_type'='default': {'mrope_section'} Unrecognized keys in `rope_scaling` for 'rope_type'='default': {'mrope_section'} Unrecognized keys in `rope_scaling` for 'rope_type'='default': {'mrope_section'} 12/05/2024 13:38:50 - INFO - llamafactory.hparams.parser - Process rank: 5, device: cuda:5, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16 Unrecognized keys in `rope_scaling` for 'rope_type'='default': {'mrope_section'} Generating train split: 0 examples [00:00, ? examples/s] Generating train split: 5754 examples [00:00, 43710.45 examples/s] Generating train split: 23931 examples [00:00, 83108.86 examples/s] Generating train split: 24618 examples [00:00, 78770.57 examples/s] 12/05/2024 13:38:51 - INFO - llamafactory.data.template - Replace eos token: <|im_end|> 12/05/2024 13:38:51 - INFO - llamafactory.data.template - Replace eos token: <|im_end|> 12/05/2024 13:38:51 - INFO - llamafactory.data.template - Replace eos token: <|im_end|> 12/05/2024 13:38:51 - INFO - llamafactory.data.template - Replace eos token: <|im_end|> 12/05/2024 13:38:51 - INFO - llamafactory.data.template - Replace eos token: <|im_end|> 12/05/2024 13:38:51 - INFO - llamafactory.data.template - Replace eos token: <|im_end|> 12/05/2024 13:38:52 - INFO - llamafactory.data.template - Replace eos token: <|im_end|> Converting format of dataset (num_proc=320): 0%| | 0/24618 [00:00 dlc1o8747bm15inu-master-0:71:71 [0] NCCL INFO Plugin name set by env to libnccl-net-none.so dlc1o8747bm15inu-master-0:71:71 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net-none.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net-none.so), using internal implementation dlc1o8747bm15inu-master-0:71:71 [0] NCCL INFO cudaDriverVersion 12010 NCCL version 2.20.5+cuda12.4 dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO NCCL_IB_HCA set to mlx5 dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:22.8.20.90<0> dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Using non-device net plugin version 0 dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Using network IB dlc1o8747bm15inu-master-0:76:76 [5] NCCL INFO cudaDriverVersion 12010 dlc1o8747bm15inu-master-0:76:76 [5] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth dlc1o8747bm15inu-master-0:76:76 [5] NCCL INFO Bootstrap : Using eth0:22.8.20.90<0> dlc1o8747bm15inu-master-0:76:76 [5] NCCL INFO Plugin name set by env to libnccl-net-none.so dlc1o8747bm15inu-master-0:75:75 [4] NCCL INFO cudaDriverVersion 12010 dlc1o8747bm15inu-master-0:75:75 [4] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth dlc1o8747bm15inu-master-0:75:75 [4] NCCL INFO Bootstrap : Using eth0:22.8.20.90<0> dlc1o8747bm15inu-master-0:75:75 [4] NCCL INFO Plugin name set by env to libnccl-net-none.so dlc1o8747bm15inu-master-0:72:72 [1] NCCL INFO cudaDriverVersion 12010 dlc1o8747bm15inu-master-0:74:74 [3] NCCL INFO cudaDriverVersion 12010 dlc1o8747bm15inu-master-0:72:72 [1] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth dlc1o8747bm15inu-master-0:74:74 [3] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth dlc1o8747bm15inu-master-0:73:73 [2] NCCL INFO cudaDriverVersion 12010 dlc1o8747bm15inu-master-0:72:72 [1] NCCL INFO Bootstrap : Using eth0:22.8.20.90<0> dlc1o8747bm15inu-master-0:74:74 [3] NCCL INFO Bootstrap : Using eth0:22.8.20.90<0> dlc1o8747bm15inu-master-0:72:72 [1] NCCL INFO Plugin name set by env to libnccl-net-none.so dlc1o8747bm15inu-master-0:74:74 [3] NCCL INFO Plugin name set by env to libnccl-net-none.so dlc1o8747bm15inu-master-0:73:73 [2] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth dlc1o8747bm15inu-master-0:73:73 [2] NCCL INFO Bootstrap : Using eth0:22.8.20.90<0> dlc1o8747bm15inu-master-0:73:73 [2] NCCL INFO Plugin name set by env to libnccl-net-none.so dlc1o8747bm15inu-master-0:78:78 [7] NCCL INFO cudaDriverVersion 12010 dlc1o8747bm15inu-master-0:78:78 [7] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth dlc1o8747bm15inu-master-0:78:78 [7] NCCL INFO Bootstrap : Using eth0:22.8.20.90<0> dlc1o8747bm15inu-master-0:78:78 [7] NCCL INFO Plugin name set by env to libnccl-net-none.so dlc1o8747bm15inu-master-0:76:76 [5] NCCL INFO NET/Plugin : dlerror=libnccl-net-none.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net-none.so), using internal implementation dlc1o8747bm15inu-master-0:75:75 [4] NCCL INFO NET/Plugin : dlerror=libnccl-net-none.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net-none.so), using internal implementation dlc1o8747bm15inu-master-0:72:72 [1] NCCL INFO NET/Plugin : dlerror=libnccl-net-none.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net-none.so), using internal implementation dlc1o8747bm15inu-master-0:74:74 [3] NCCL INFO NET/Plugin : dlerror=libnccl-net-none.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net-none.so), using internal implementation dlc1o8747bm15inu-master-0:73:73 [2] NCCL INFO NET/Plugin : dlerror=libnccl-net-none.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net-none.so), using internal implementation dlc1o8747bm15inu-master-0:77:77 [6] NCCL INFO cudaDriverVersion 12010 dlc1o8747bm15inu-master-0:77:77 [6] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth dlc1o8747bm15inu-master-0:78:78 [7] NCCL INFO NET/Plugin : dlerror=libnccl-net-none.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net-none.so), using internal implementation dlc1o8747bm15inu-master-0:77:77 [6] NCCL INFO Bootstrap : Using eth0:22.8.20.90<0> dlc1o8747bm15inu-master-0:77:77 [6] NCCL INFO Plugin name set by env to libnccl-net-none.so dlc1o8747bm15inu-master-0:77:77 [6] NCCL INFO NET/Plugin : dlerror=libnccl-net-none.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net-none.so), using internal implementation dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO NCCL_IB_HCA set to mlx5 dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO NCCL_IB_HCA set to mlx5 dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO NCCL_IB_HCA set to mlx5 dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO NCCL_IB_HCA set to mlx5 dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO NCCL_IB_HCA set to mlx5 dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO NCCL_IB_HCA set to mlx5 dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO NCCL_IB_HCA set to mlx5 dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:22.8.20.90<0> dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Using non-device net plugin version 0 dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Using network IB dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:22.8.20.90<0> dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:22.8.20.90<0> dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Using non-device net plugin version 0 dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Using network IB dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:22.8.20.90<0> dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Using non-device net plugin version 0 dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Using network IB dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:22.8.20.90<0> dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:22.8.20.90<0> dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:22.8.20.90<0> dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Using non-device net plugin version 0 dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Using network IB dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Using non-device net plugin version 0 dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Using network IB dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Using non-device net plugin version 0 dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Using network IB dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Using non-device net plugin version 0 dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Using network IB dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO comm 0xa547c690 rank 7 nranks 8 cudaDev 7 nvmlDev 7 busId 80 commId 0x28a6667510b10773 - Init START dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO comm 0xa4c31ca0 rank 1 nranks 8 cudaDev 1 nvmlDev 1 busId 20 commId 0x28a6667510b10773 - Init START dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO comm 0xa6df0650 rank 2 nranks 8 cudaDev 2 nvmlDev 2 busId 30 commId 0x28a6667510b10773 - Init START dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO comm 0xa55b37a0 rank 6 nranks 8 cudaDev 6 nvmlDev 6 busId 70 commId 0x28a6667510b10773 - Init START dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO comm 0xaf287310 rank 4 nranks 8 cudaDev 4 nvmlDev 4 busId 50 commId 0x28a6667510b10773 - Init START dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO comm 0xa4b2bc20 rank 5 nranks 8 cudaDev 5 nvmlDev 5 busId 60 commId 0x28a6667510b10773 - Init START dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO comm 0xa75046a0 rank 3 nranks 8 cudaDev 3 nvmlDev 3 busId 40 commId 0x28a6667510b10773 - Init START dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO comm 0xc0177130 rank 0 nranks 8 cudaDev 0 nvmlDev 0 busId 10 commId 0x28a6667510b10773 - Init START dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Setting affinity for GPU 1 to ffff,ffffffff,ffffffff dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO NVLS multicast support is not available on dev 1 dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO NVLS multicast support is not available on dev 7 dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Setting affinity for GPU 2 to ffff,ffffffff,ffffffff dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO NVLS multicast support is not available on dev 2 dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO NVLS multicast support is not available on dev 4 dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO NVLS multicast support is not available on dev 6 dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Setting affinity for GPU 0 to ffff,ffffffff,ffffffff dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO NVLS multicast support is not available on dev 0 dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Setting affinity for GPU 3 to ffff,ffffffff,ffffffff dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO NVLS multicast support is not available on dev 3 dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO NVLS multicast support is not available on dev 5 dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO comm 0xaf287310 rank 4 nRanks 8 nNodes 1 localRanks 8 localRank 4 MNNVL 0 dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO comm 0xa75046a0 rank 3 nRanks 8 nNodes 1 localRanks 8 localRank 3 MNNVL 0 dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO NCCL_MIN_NCHANNELS set by environment to 4. dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO comm 0xa6df0650 rank 2 nRanks 8 nNodes 1 localRanks 8 localRank 2 MNNVL 0 dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO NCCL_MIN_NCHANNELS set by environment to 4. dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO comm 0xa547c690 rank 7 nRanks 8 nNodes 1 localRanks 8 localRank 7 MNNVL 0 dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO NCCL_MIN_NCHANNELS set by environment to 4. dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO comm 0xa4c31ca0 rank 1 nRanks 8 nNodes 1 localRanks 8 localRank 1 MNNVL 0 dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO comm 0xa55b37a0 rank 6 nRanks 8 nNodes 1 localRanks 8 localRank 6 MNNVL 0 dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO comm 0xa4b2bc20 rank 5 nRanks 8 nNodes 1 localRanks 8 localRank 5 MNNVL 0 dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO NCCL_MIN_NCHANNELS set by environment to 4. dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO comm 0xc0177130 rank 0 nRanks 8 nNodes 1 localRanks 8 localRank 0 MNNVL 0 dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO NCCL_MIN_NCHANNELS set by environment to 4. dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO NCCL_MIN_NCHANNELS set by environment to 4. dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO NCCL_MIN_NCHANNELS set by environment to 4. dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO NCCL_MIN_NCHANNELS set by environment to 4. dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Trees [0] 5/-1/-1->4->3 [1] 5/-1/-1->4->3 [2] 5/-1/-1->4->3 [3] 5/-1/-1->4->3 [4] 5/-1/-1->4->3 [5] 5/-1/-1->4->3 [6] 5/-1/-1->4->3 [7] 5/-1/-1->4->3 [8] 5/-1/-1->4->3 [9] 5/-1/-1->4->3 [10] 5/-1/-1->4->3 [11] 5/-1/-1->4->3 [12] 5/-1/-1->4->3 [13] 5/-1/-1->4->3 [14] 5/-1/-1->4->3 [15] 5/-1/-1->4->3 [16] 5/-1/-1->4->3 [17] 5/-1/-1->4->3 [18] 5/-1/-1->4->3 [19] 5/-1/-1->4->3 [20] 5/-1/-1->4->3 [21] 5/-1/-1->4->3 [22] 5/-1/-1->4->3 [23] 5/-1/-1->4->3 dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO P2P Chunksize set to 524288 dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 00/24 : 0 1 2 3 4 5 6 7 dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Trees [0] 3/-1/-1->2->1 [1] 3/-1/-1->2->1 [2] 3/-1/-1->2->1 [3] 3/-1/-1->2->1 [4] 3/-1/-1->2->1 [5] 3/-1/-1->2->1 [6] 3/-1/-1->2->1 [7] 3/-1/-1->2->1 [8] 3/-1/-1->2->1 [9] 3/-1/-1->2->1 [10] 3/-1/-1->2->1 [11] 3/-1/-1->2->1 [12] 3/-1/-1->2->1 [13] 3/-1/-1->2->1 [14] 3/-1/-1->2->1 [15] 3/-1/-1->2->1 [16] 3/-1/-1->2->1 [17] 3/-1/-1->2->1 [18] 3/-1/-1->2->1 [19] 3/-1/-1->2->1 [20] 3/-1/-1->2->1 [21] 3/-1/-1->2->1 [22] 3/-1/-1->2->1 [23] 3/-1/-1->2->1 dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 01/24 : 0 1 2 3 4 5 6 7 dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO P2P Chunksize set to 524288 dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Trees [0] -1/-1/-1->7->6 [1] -1/-1/-1->7->6 [2] -1/-1/-1->7->6 [3] -1/-1/-1->7->6 [4] -1/-1/-1->7->6 [5] -1/-1/-1->7->6 [6] -1/-1/-1->7->6 [7] -1/-1/-1->7->6 [8] -1/-1/-1->7->6 [9] -1/-1/-1->7->6 [10] -1/-1/-1->7->6 [11] -1/-1/-1->7->6 [12] -1/-1/-1->7->6 [13] -1/-1/-1->7->6 [14] -1/-1/-1->7->6 [15] -1/-1/-1->7->6 [16] -1/-1/-1->7->6 [17] -1/-1/-1->7->6 [18] -1/-1/-1->7->6 [19] -1/-1/-1->7->6 [20] -1/-1/-1->7->6 [21] -1/-1/-1->7->6 [22] -1/-1/-1->7->6 [23] -1/-1/-1->7->6 dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 02/24 : 0 1 2 3 4 5 6 7 dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO P2P Chunksize set to 524288 dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Trees [0] 4/-1/-1->3->2 [1] 4/-1/-1->3->2 [2] 4/-1/-1->3->2 [3] 4/-1/-1->3->2 [4] 4/-1/-1->3->2 [5] 4/-1/-1->3->2 [6] 4/-1/-1->3->2 [7] 4/-1/-1->3->2 [8] 4/-1/-1->3->2 [9] 4/-1/-1->3->2 [10] 4/-1/-1->3->2 [11] 4/-1/-1->3->2 [12] 4/-1/-1->3->2 [13] 4/-1/-1->3->2 [14] 4/-1/-1->3->2 [15] 4/-1/-1->3->2 [16] 4/-1/-1->3->2 [17] 4/-1/-1->3->2 [18] 4/-1/-1->3->2 [19] 4/-1/-1->3->2 [20] 4/-1/-1->3->2 [21] 4/-1/-1->3->2 [22] 4/-1/-1->3->2 [23] 4/-1/-1->3->2 dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Trees [0] 7/-1/-1->6->5 [1] 7/-1/-1->6->5 [2] 7/-1/-1->6->5 [3] 7/-1/-1->6->5 [4] 7/-1/-1->6->5 [5] 7/-1/-1->6->5 [6] 7/-1/-1->6->5 [7] 7/-1/-1->6->5 [8] 7/-1/-1->6->5 [9] 7/-1/-1->6->5 [10] 7/-1/-1->6->5 [11] 7/-1/-1->6->5 [12] 7/-1/-1->6->5 [13] 7/-1/-1->6->5 [14] 7/-1/-1->6->5 [15] 7/-1/-1->6->5 [16] 7/-1/-1->6->5 [17] 7/-1/-1->6->5 [18] 7/-1/-1->6->5 [19] 7/-1/-1->6->5 [20] 7/-1/-1->6->5 [21] 7/-1/-1->6->5 [22] 7/-1/-1->6->5 [23] 7/-1/-1->6->5 dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 2/-1/-1->1->0 [2] 2/-1/-1->1->0 [3] 2/-1/-1->1->0 [4] 2/-1/-1->1->0 [5] 2/-1/-1->1->0 [6] 2/-1/-1->1->0 [7] 2/-1/-1->1->0 [8] 2/-1/-1->1->0 [9] 2/-1/-1->1->0 [10] 2/-1/-1->1->0 [11] 2/-1/-1->1->0 [12] 2/-1/-1->1->0 [13] 2/-1/-1->1->0 [14] 2/-1/-1->1->0 [15] 2/-1/-1->1->0 [16] 2/-1/-1->1->0 [17] 2/-1/-1->1->0 [18] 2/-1/-1->1->0 [19] 2/-1/-1->1->0 [20] 2/-1/-1->1->0 [21] 2/-1/-1->1->0 [22] 2/-1/-1->1->0 [23] 2/-1/-1->1->0 dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO P2P Chunksize set to 524288 dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 03/24 : 0 1 2 3 4 5 6 7 dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO P2P Chunksize set to 524288 dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO P2P Chunksize set to 524288 dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 04/24 : 0 1 2 3 4 5 6 7 dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Trees [0] 6/-1/-1->5->4 [1] 6/-1/-1->5->4 [2] 6/-1/-1->5->4 [3] 6/-1/-1->5->4 [4] 6/-1/-1->5->4 [5] 6/-1/-1->5->4 [6] 6/-1/-1->5->4 [7] 6/-1/-1->5->4 [8] 6/-1/-1->5->4 [9] 6/-1/-1->5->4 [10] 6/-1/-1->5->4 [11] 6/-1/-1->5->4 [12] 6/-1/-1->5->4 [13] 6/-1/-1->5->4 [14] 6/-1/-1->5->4 [15] 6/-1/-1->5->4 [16] 6/-1/-1->5->4 [17] 6/-1/-1->5->4 [18] 6/-1/-1->5->4 [19] 6/-1/-1->5->4 [20] 6/-1/-1->5->4 [21] 6/-1/-1->5->4 [22] 6/-1/-1->5->4 [23] 6/-1/-1->5->4 dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 05/24 : 0 1 2 3 4 5 6 7 dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO P2P Chunksize set to 524288 dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 06/24 : 0 1 2 3 4 5 6 7 dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 07/24 : 0 1 2 3 4 5 6 7 dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 08/24 : 0 1 2 3 4 5 6 7 dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 09/24 : 0 1 2 3 4 5 6 7 dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 10/24 : 0 1 2 3 4 5 6 7 dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 11/24 : 0 1 2 3 4 5 6 7 dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 12/24 : 0 1 2 3 4 5 6 7 dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 13/24 : 0 1 2 3 4 5 6 7 dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 14/24 : 0 1 2 3 4 5 6 7 dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 15/24 : 0 1 2 3 4 5 6 7 dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 16/24 : 0 1 2 3 4 5 6 7 dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 17/24 : 0 1 2 3 4 5 6 7 dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 18/24 : 0 1 2 3 4 5 6 7 dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 19/24 : 0 1 2 3 4 5 6 7 dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 20/24 : 0 1 2 3 4 5 6 7 dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 21/24 : 0 1 2 3 4 5 6 7 dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 22/24 : 0 1 2 3 4 5 6 7 dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 23/24 : 0 1 2 3 4 5 6 7 dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 [2] 1/-1/-1->0->-1 [3] 1/-1/-1->0->-1 [4] 1/-1/-1->0->-1 [5] 1/-1/-1->0->-1 [6] 1/-1/-1->0->-1 [7] 1/-1/-1->0->-1 [8] 1/-1/-1->0->-1 [9] 1/-1/-1->0->-1 [10] 1/-1/-1->0->-1 [11] 1/-1/-1->0->-1 [12] 1/-1/-1->0->-1 [13] 1/-1/-1->0->-1 [14] 1/-1/-1->0->-1 [15] 1/-1/-1->0->-1 [16] 1/-1/-1->0->-1 [17] 1/-1/-1->0->-1 [18] 1/-1/-1->0->-1 [19] 1/-1/-1->0->-1 [20] 1/-1/-1->0->-1 [21] 1/-1/-1->0->-1 [22] 1/-1/-1->0->-1 [23] 1/-1/-1->0->-1 dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO P2P Chunksize set to 524288 dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 00/0 : 5[5] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 00/0 : 6[6] -> 7[7] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 00/0 : 4[4] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 00/0 : 7[7] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 00/0 : 1[1] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 00/0 : 2[2] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 00/0 : 0[0] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 00/0 : 3[3] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 01/0 : 5[5] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 01/0 : 6[6] -> 7[7] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 01/0 : 4[4] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 01/0 : 7[7] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 01/0 : 1[1] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 01/0 : 2[2] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 01/0 : 0[0] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 01/0 : 3[3] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 02/0 : 5[5] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 02/0 : 6[6] -> 7[7] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 02/0 : 4[4] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 02/0 : 7[7] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 02/0 : 1[1] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 02/0 : 2[2] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 02/0 : 0[0] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 02/0 : 3[3] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 03/0 : 5[5] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 03/0 : 6[6] -> 7[7] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 03/0 : 4[4] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 03/0 : 7[7] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 03/0 : 1[1] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 03/0 : 2[2] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 03/0 : 0[0] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 03/0 : 3[3] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 04/0 : 5[5] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 04/0 : 6[6] -> 7[7] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 04/0 : 4[4] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 04/0 : 7[7] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 04/0 : 1[1] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 04/0 : 2[2] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 04/0 : 0[0] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 04/0 : 3[3] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 05/0 : 5[5] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 05/0 : 6[6] -> 7[7] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 05/0 : 4[4] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 05/0 : 7[7] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 05/0 : 1[1] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 05/0 : 2[2] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 05/0 : 0[0] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 05/0 : 3[3] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 06/0 : 5[5] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 06/0 : 6[6] -> 7[7] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 06/0 : 4[4] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 06/0 : 7[7] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 06/0 : 1[1] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 06/0 : 2[2] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 06/0 : 0[0] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 06/0 : 3[3] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 07/0 : 5[5] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 07/0 : 6[6] -> 7[7] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 07/0 : 4[4] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 07/0 : 7[7] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 07/0 : 1[1] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 07/0 : 2[2] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 07/0 : 0[0] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 07/0 : 3[3] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 08/0 : 5[5] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 08/0 : 6[6] -> 7[7] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 08/0 : 4[4] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 08/0 : 7[7] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 08/0 : 1[1] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 08/0 : 2[2] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 08/0 : 0[0] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 08/0 : 3[3] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 09/0 : 5[5] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 09/0 : 6[6] -> 7[7] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 09/0 : 4[4] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 09/0 : 7[7] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 09/0 : 1[1] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 09/0 : 2[2] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 09/0 : 0[0] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 09/0 : 3[3] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 10/0 : 5[5] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 10/0 : 6[6] -> 7[7] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 10/0 : 4[4] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 10/0 : 7[7] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 10/0 : 1[1] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 10/0 : 2[2] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 10/0 : 0[0] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 10/0 : 3[3] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 11/0 : 5[5] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 11/0 : 6[6] -> 7[7] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 11/0 : 4[4] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 11/0 : 7[7] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 11/0 : 1[1] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 11/0 : 2[2] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 11/0 : 0[0] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 11/0 : 3[3] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 12/0 : 5[5] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 12/0 : 6[6] -> 7[7] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 12/0 : 4[4] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 12/0 : 7[7] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 12/0 : 1[1] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 12/0 : 2[2] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 12/0 : 0[0] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 12/0 : 3[3] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 13/0 : 5[5] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 13/0 : 6[6] -> 7[7] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 13/0 : 4[4] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 13/0 : 7[7] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 13/0 : 1[1] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 13/0 : 2[2] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 13/0 : 0[0] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 13/0 : 3[3] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 14/0 : 5[5] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 14/0 : 6[6] -> 7[7] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 14/0 : 4[4] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 14/0 : 7[7] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 14/0 : 1[1] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 14/0 : 2[2] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 14/0 : 0[0] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 14/0 : 3[3] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 15/0 : 5[5] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 15/0 : 6[6] -> 7[7] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 15/0 : 4[4] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 15/0 : 7[7] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 15/0 : 1[1] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 15/0 : 2[2] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 15/0 : 0[0] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 15/0 : 3[3] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 16/0 : 5[5] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 16/0 : 6[6] -> 7[7] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 16/0 : 4[4] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 16/0 : 7[7] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 16/0 : 1[1] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 16/0 : 2[2] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 16/0 : 0[0] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 16/0 : 3[3] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 17/0 : 5[5] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 17/0 : 6[6] -> 7[7] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 17/0 : 4[4] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 17/0 : 7[7] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 17/0 : 1[1] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 17/0 : 2[2] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 17/0 : 0[0] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 17/0 : 3[3] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 18/0 : 5[5] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 18/0 : 6[6] -> 7[7] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 18/0 : 4[4] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 18/0 : 7[7] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 18/0 : 1[1] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 18/0 : 2[2] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 18/0 : 0[0] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 18/0 : 3[3] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 19/0 : 5[5] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 19/0 : 6[6] -> 7[7] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 19/0 : 4[4] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 19/0 : 7[7] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 19/0 : 1[1] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 19/0 : 2[2] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 19/0 : 0[0] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 19/0 : 3[3] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 20/0 : 5[5] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 20/0 : 6[6] -> 7[7] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 20/0 : 4[4] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 20/0 : 7[7] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 20/0 : 1[1] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 20/0 : 2[2] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 20/0 : 0[0] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 20/0 : 3[3] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 21/0 : 5[5] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 21/0 : 6[6] -> 7[7] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 21/0 : 4[4] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 21/0 : 7[7] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 21/0 : 1[1] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 21/0 : 2[2] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 21/0 : 0[0] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 21/0 : 3[3] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 22/0 : 5[5] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 22/0 : 6[6] -> 7[7] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 22/0 : 4[4] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 22/0 : 7[7] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 22/0 : 1[1] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 22/0 : 2[2] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 22/0 : 0[0] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 22/0 : 3[3] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 23/0 : 5[5] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 23/0 : 6[6] -> 7[7] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 23/0 : 4[4] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 23/0 : 7[7] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 23/0 : 1[1] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 23/0 : 2[2] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 23/0 : 0[0] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 23/0 : 3[3] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Connected all rings dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Connected all rings dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Connected all rings dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Connected all rings dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Connected all rings dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 00/0 : 2[2] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 01/0 : 2[2] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 02/0 : 2[2] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 03/0 : 2[2] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 04/0 : 2[2] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 05/0 : 2[2] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 00/0 : 3[3] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 06/0 : 2[2] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 01/0 : 3[3] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 00/0 : 1[1] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 07/0 : 2[2] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 02/0 : 3[3] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Connected all rings dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 00/0 : 7[7] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Connected all rings dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Connected all rings dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 01/0 : 1[1] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 08/0 : 2[2] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 03/0 : 3[3] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 02/0 : 1[1] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 09/0 : 2[2] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 04/0 : 3[3] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 01/0 : 7[7] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 03/0 : 1[1] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 10/0 : 2[2] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 05/0 : 3[3] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 02/0 : 7[7] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 04/0 : 1[1] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 11/0 : 2[2] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 06/0 : 3[3] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 03/0 : 7[7] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 05/0 : 1[1] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 12/0 : 2[2] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 07/0 : 3[3] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 04/0 : 7[7] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 06/0 : 1[1] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 13/0 : 2[2] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 08/0 : 3[3] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 05/0 : 7[7] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 07/0 : 1[1] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 14/0 : 2[2] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 09/0 : 3[3] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 06/0 : 7[7] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 08/0 : 1[1] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 00/0 : 4[4] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 15/0 : 2[2] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 10/0 : 3[3] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 07/0 : 7[7] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 09/0 : 1[1] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 01/0 : 4[4] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 16/0 : 2[2] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 11/0 : 3[3] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 08/0 : 7[7] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 10/0 : 1[1] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 02/0 : 4[4] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 17/0 : 2[2] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 12/0 : 3[3] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 09/0 : 7[7] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 11/0 : 1[1] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 03/0 : 4[4] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 18/0 : 2[2] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 13/0 : 3[3] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 10/0 : 7[7] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 12/0 : 1[1] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 04/0 : 4[4] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 19/0 : 2[2] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 14/0 : 3[3] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 11/0 : 7[7] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 13/0 : 1[1] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 05/0 : 4[4] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 20/0 : 2[2] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 15/0 : 3[3] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 12/0 : 7[7] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 14/0 : 1[1] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 06/0 : 4[4] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 21/0 : 2[2] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 16/0 : 3[3] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 13/0 : 7[7] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 15/0 : 1[1] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 07/0 : 4[4] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 22/0 : 2[2] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 17/0 : 3[3] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 14/0 : 7[7] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 16/0 : 1[1] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 08/0 : 4[4] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 23/0 : 2[2] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 18/0 : 3[3] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 15/0 : 7[7] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 17/0 : 1[1] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 09/0 : 4[4] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 19/0 : 3[3] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 16/0 : 7[7] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 18/0 : 1[1] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 10/0 : 4[4] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 20/0 : 3[3] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 17/0 : 7[7] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 19/0 : 1[1] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 11/0 : 4[4] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 21/0 : 3[3] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 18/0 : 7[7] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 20/0 : 1[1] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 12/0 : 4[4] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 22/0 : 3[3] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 19/0 : 7[7] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 21/0 : 1[1] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 13/0 : 4[4] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 23/0 : 3[3] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 20/0 : 7[7] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 22/0 : 1[1] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 14/0 : 4[4] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 21/0 : 7[7] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 23/0 : 1[1] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 15/0 : 4[4] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 22/0 : 7[7] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 16/0 : 4[4] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 23/0 : 7[7] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 00/0 : 6[6] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 17/0 : 4[4] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 00/0 : 5[5] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 01/0 : 6[6] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 18/0 : 4[4] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 01/0 : 5[5] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 02/0 : 6[6] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 19/0 : 4[4] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 02/0 : 5[5] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 03/0 : 6[6] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 20/0 : 4[4] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 03/0 : 5[5] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 04/0 : 6[6] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 21/0 : 4[4] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 04/0 : 5[5] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 05/0 : 6[6] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 22/0 : 4[4] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 05/0 : 5[5] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 06/0 : 6[6] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 23/0 : 4[4] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 06/0 : 5[5] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 07/0 : 6[6] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 07/0 : 5[5] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 08/0 : 6[6] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 08/0 : 5[5] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 09/0 : 6[6] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 09/0 : 5[5] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 10/0 : 6[6] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 10/0 : 5[5] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 11/0 : 6[6] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 11/0 : 5[5] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 12/0 : 6[6] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 12/0 : 5[5] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 13/0 : 6[6] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 13/0 : 5[5] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 14/0 : 6[6] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 14/0 : 5[5] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 15/0 : 6[6] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 15/0 : 5[5] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 16/0 : 6[6] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 16/0 : 5[5] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 17/0 : 6[6] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 17/0 : 5[5] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 18/0 : 6[6] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 18/0 : 5[5] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 19/0 : 6[6] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 19/0 : 5[5] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 20/0 : 6[6] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 20/0 : 5[5] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 21/0 : 6[6] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 21/0 : 5[5] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 22/0 : 6[6] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 22/0 : 5[5] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 23/0 : 6[6] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 23/0 : 5[5] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Connected all trees dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO 24 coll channels, 0 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Connected all trees dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO 24 coll channels, 0 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Connected all trees dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO 24 coll channels, 0 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Connected all trees dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO 24 coll channels, 0 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Connected all trees dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO 24 coll channels, 0 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Connected all trees dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO 24 coll channels, 0 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Connected all trees dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Connected all trees dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO 24 coll channels, 0 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO 24 coll channels, 0 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO comm 0xa75046a0 rank 3 nranks 8 cudaDev 3 nvmlDev 3 busId 40 commId 0x28a6667510b10773 - Init COMPLETE dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO comm 0xa4b2bc20 rank 5 nranks 8 cudaDev 5 nvmlDev 5 busId 60 commId 0x28a6667510b10773 - Init COMPLETE dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO comm 0xa55b37a0 rank 6 nranks 8 cudaDev 6 nvmlDev 6 busId 70 commId 0x28a6667510b10773 - Init COMPLETE dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO comm 0xa4c31ca0 rank 1 nranks 8 cudaDev 1 nvmlDev 1 busId 20 commId 0x28a6667510b10773 - Init COMPLETE dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO comm 0xa547c690 rank 7 nranks 8 cudaDev 7 nvmlDev 7 busId 80 commId 0x28a6667510b10773 - Init COMPLETE dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO comm 0xaf287310 rank 4 nranks 8 cudaDev 4 nvmlDev 4 busId 50 commId 0x28a6667510b10773 - Init COMPLETE dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO comm 0xc0177130 rank 0 nranks 8 cudaDev 0 nvmlDev 0 busId 10 commId 0x28a6667510b10773 - Init COMPLETE dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO comm 0xa6df0650 rank 2 nranks 8 cudaDev 2 nvmlDev 2 busId 30 commId 0x28a6667510b10773 - Init COMPLETE 12/05/2024 13:39:03 - INFO - llamafactory.data.loader - Loading dataset sim/1205/1205_aw_24k.jsonl... 12/05/2024 13:39:03 - INFO - llamafactory.data.loader - Loading dataset sim/1205/1205_aw_24k.jsonl... 12/05/2024 13:39:03 - INFO - llamafactory.data.loader - Loading dataset sim/1205/1205_aw_24k.jsonl... 12/05/2024 13:39:03 - INFO - llamafactory.data.loader - Loading dataset sim/1205/1205_aw_24k.jsonl... 12/05/2024 13:39:03 - INFO - llamafactory.data.loader - Loading dataset sim/1205/1205_aw_24k.jsonl... 12/05/2024 13:39:03 - INFO - llamafactory.data.loader - Loading dataset sim/1205/1205_aw_24k.jsonl... 12/05/2024 13:39:03 - INFO - llamafactory.data.loader - Loading dataset sim/1205/1205_aw_24k.jsonl... Running tokenizer on dataset (num_proc=320): 0%| | 0/24618 [00:00system You are a helpful assistant.<|im_end|> <|im_start|>user <|vision_start|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|vision_end|> You are a GUI task expert, I will provide you with a high-level instruction, a screenshot with its corresponding accessibility tree. High-level instruction: Exit the Camera app and return to the home screen. Accessibility tree: {"android.view.View com.android.camera2 com.android.camera2:id/preview_overlay": "(540.0, 1232.5)", "Shutter": "(540.0, 2179.5)", "MODE LIST": "(124.0, 191.0)", "FILMSTRIP": "(371.0, 191.0)", "Z-": "(609.5, 191.0)", "Z+": "(840.5, 191.0)", "Countdown timer is off": "(382.0, 1927.0)", "Grid lines off": "(540.0, 1927.0)", "Back camera": "(698.0, 1927.0)"} Please generate the low-level thought and action for the next step.<|im_end|> <|im_start|>assistant Low-level thought: Press the back button to exit the Camera app and return to the home screen. action: {"action_type":"navigate_back"}<|im_end|> label_ids: [-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 24187, 11591, 3381, 25, 8445, 279, 1182, 3137, 311, 4869, 279, 14332, 906, 323, 470, 311, 279, 2114, 4171, 624, 1311, 25, 5212, 1311, 1819, 3252, 70839, 3895, 9207, 151645] labels: Low-level thought: Press the back button to exit the Camera app and return to the home screen. action: {"action_type":"navigate_back"}<|im_end|> [INFO|configuration_utils.py:670] 2024-12-05 13:40:16,953 >> loading configuration file /nas/shared/NLP_A100/wuzhenyu/LLMs/Qwen2-VL-7B-Instruct/config.json [WARNING|modeling_rope_utils.py:379] 2024-12-05 13:40:16,953 >> Unrecognized keys in `rope_scaling` for 'rope_type'='default': {'mrope_section'} [INFO|configuration_utils.py:739] 2024-12-05 13:40:16,954 >> Model config Qwen2VLConfig { "_name_or_path": "/nas/shared/NLP_A100/wuzhenyu/LLMs/Qwen2-VL-7B-Instruct", "architectures": [ "Qwen2VLForConditionalGeneration" ], "attention_dropout": 0.0, "bos_token_id": 151643, "eos_token_id": 151645, "hidden_act": "silu", "hidden_size": 3584, "image_token_id": 151655, "initializer_range": 0.02, "intermediate_size": 18944, "max_position_embeddings": 32768, "max_window_layers": 28, "model_type": "qwen2_vl", "num_attention_heads": 28, "num_hidden_layers": 28, "num_key_value_heads": 4, "rms_norm_eps": 1e-06, "rope_scaling": { "mrope_section": [ 16, 24, 24 ], "rope_type": "default", "type": "default" }, "rope_theta": 1000000.0, "sliding_window": 32768, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.45.0", "use_cache": true, "use_sliding_window": false, "video_token_id": 151656, "vision_config": { "in_chans": 3, "model_type": "qwen2_vl", "spatial_patch_size": 14 }, "vision_end_token_id": 151653, "vision_start_token_id": 151652, "vision_token_id": 151654, "vocab_size": 152064 } [INFO|modeling_utils.py:3723] 2024-12-05 13:40:17,192 >> loading weights file /nas/shared/NLP_A100/wuzhenyu/LLMs/Qwen2-VL-7B-Instruct/model.safetensors.index.json [INFO|modeling_utils.py:1622] 2024-12-05 13:40:17,294 >> Instantiating Qwen2VLForConditionalGeneration model under default dtype torch.bfloat16. [INFO|configuration_utils.py:1099] 2024-12-05 13:40:17,295 >> Generate config GenerationConfig { "bos_token_id": 151643, "eos_token_id": 151645 } [WARNING|logging.py:328] 2024-12-05 13:40:17,317 >> `Qwen2VLRotaryEmbedding` can now be fully parameterized by passing the model config through the `config` argument. All other arguments will be removed in v4.46 Loading checkpoint shards: 0%| | 0/5 [00:00> All model checkpoint weights were used when initializing Qwen2VLForConditionalGeneration. [INFO|modeling_utils.py:4576] 2024-12-05 13:42:11,359 >> All the weights of Qwen2VLForConditionalGeneration were initialized from the model checkpoint at /nas/shared/NLP_A100/wuzhenyu/LLMs/Qwen2-VL-7B-Instruct. If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2VLForConditionalGeneration for predictions without further training. 12/05/2024 13:42:11 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled. 12/05/2024 13:42:11 - INFO - llamafactory.model.model_utils.attention - Using torch SDPA for faster training and inference. 12/05/2024 13:42:11 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32. 12/05/2024 13:42:11 - INFO - llamafactory.model.adapter - Fine-tuning method: Full [INFO|configuration_utils.py:1052] 2024-12-05 13:42:11,380 >> loading configuration file /nas/shared/NLP_A100/wuzhenyu/LLMs/Qwen2-VL-7B-Instruct/generation_config.json [INFO|configuration_utils.py:1099] 2024-12-05 13:42:11,380 >> Generate config GenerationConfig { "bos_token_id": 151643, "do_sample": true, "eos_token_id": [ 151645, 151643 ], "pad_token_id": 151643, "temperature": 0.01, "top_k": 1, "top_p": 0.001 } 12/05/2024 13:42:11 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled. 12/05/2024 13:42:11 - INFO - llamafactory.model.model_utils.attention - Using torch SDPA for faster training and inference. 12/05/2024 13:42:11 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32. 12/05/2024 13:42:11 - INFO - llamafactory.model.adapter - Fine-tuning method: Full 12/05/2024 13:42:11 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled. 12/05/2024 13:42:11 - INFO - llamafactory.model.model_utils.attention - Using torch SDPA for faster training and inference. 12/05/2024 13:42:11 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32. 12/05/2024 13:42:11 - INFO - llamafactory.model.adapter - Fine-tuning method: Full 12/05/2024 13:42:11 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled. 12/05/2024 13:42:11 - INFO - llamafactory.model.model_utils.attention - Using torch SDPA for faster training and inference. 12/05/2024 13:42:11 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled. 12/05/2024 13:42:11 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32. 12/05/2024 13:42:11 - INFO - llamafactory.model.adapter - Fine-tuning method: Full 12/05/2024 13:42:11 - INFO - llamafactory.model.model_utils.attention - Using torch SDPA for faster training and inference. 12/05/2024 13:42:11 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled. 12/05/2024 13:42:11 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32. 12/05/2024 13:42:11 - INFO - llamafactory.model.adapter - Fine-tuning method: Full 12/05/2024 13:42:11 - INFO - llamafactory.model.model_utils.attention - Using torch SDPA for faster training and inference. 12/05/2024 13:42:11 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32. 12/05/2024 13:42:11 - INFO - llamafactory.model.adapter - Fine-tuning method: Full 12/05/2024 13:42:11 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled. 12/05/2024 13:42:11 - INFO - llamafactory.model.model_utils.attention - Using torch SDPA for faster training and inference. 12/05/2024 13:42:11 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32. 12/05/2024 13:42:11 - INFO - llamafactory.model.adapter - Fine-tuning method: Full 12/05/2024 13:42:11 - INFO - llamafactory.model.loader - trainable params: 7,615,616,512 || all params: 8,291,375,616 || trainable%: 91.8499 12/05/2024 13:42:11 - INFO - llamafactory.model.loader - trainable params: 7,615,616,512 || all params: 8,291,375,616 || trainable%: 91.8499 12/05/2024 13:42:11 - INFO - llamafactory.model.loader - trainable params: 7,615,616,512 || all params: 8,291,375,616 || trainable%: 91.8499 12/05/2024 13:42:11 - INFO - llamafactory.model.loader - trainable params: 7,615,616,512 || all params: 8,291,375,616 || trainable%: 91.8499 12/05/2024 13:42:11 - INFO - llamafactory.model.loader - trainable params: 7,615,616,512 || all params: 8,291,375,616 || trainable%: 91.8499 12/05/2024 13:42:11 - INFO - llamafactory.model.loader - trainable params: 7,615,616,512 || all params: 8,291,375,616 || trainable%: 91.8499 12/05/2024 13:42:11 - INFO - llamafactory.model.loader - trainable params: 7,615,616,512 || all params: 8,291,375,616 || trainable%: 91.8499 12/05/2024 13:42:11 - INFO - llamafactory.model.loader - trainable params: 7,615,616,512 || all params: 8,291,375,616 || trainable%: 91.8499 Detected kernel version 4.19.91, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher. [INFO|trainer.py:667] 2024-12-05 13:42:11,800 >> Using auto half precision backend [2024-12-05 13:42:15,269] [INFO] [logging.py:128:log_dist] [Rank 0] DeepSpeed info: version=0.15.4, git-hash=unknown, git-branch=unknown [2024-12-05 13:42:15,269] [INFO] [config.py:733:__init__] Config mesh_device None world_size = 8 dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Using non-device net plugin version 0 dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Using non-device net plugin version 0 dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Using non-device net plugin version 0 dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Using non-device net plugin version 0 dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Using network IB dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Using network IB dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Using non-device net plugin version 0 dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Using non-device net plugin version 0 dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Using network IB dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Using network IB dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Using network IB dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Using network IB dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Using non-device net plugin version 0 dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Using network IB dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Using non-device net plugin version 0 dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Using network IB dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO bootstrapSplit: comm 0xa6a0b1d0 parent 0xa4c31ca0 rank 1 nranks 8 color -934961569 key 1 prev 0 next 2 - DONE dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO bootstrapSplit: comm 0xa6f91730 parent 0xa547c690 rank 7 nranks 8 color -934961569 key 7 prev 6 next 0 - DONE dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO bootstrapSplit: comm 0xa79ebd30 parent 0xa4b2bc20 rank 5 nranks 8 color -934961569 key 5 prev 4 next 6 - DONE dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO bootstrapSplit: comm 0xa6c5e6a0 parent 0xaf287310 rank 4 nranks 8 color -934961569 key 4 prev 3 next 5 - DONE dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO bootstrapSplit: comm 0xa7df0940 parent 0xa55b37a0 rank 6 nranks 8 color -934961569 key 6 prev 5 next 7 - DONE dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO bootstrapSplit: comm 0xc33c3ce0 parent 0xc0177130 rank 0 nranks 8 color -934961569 key 0 prev 7 next 1 - DONE dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO comm 0xa6f91730 rank 7 nranks 8 cudaDev 7 nvmlDev 7 busId 80 commId 0x1900e06671419955 - Init START dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO comm 0xa79ebd30 rank 5 nranks 8 cudaDev 5 nvmlDev 5 busId 60 commId 0x1900e06671419955 - Init START dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO comm 0xa7df0940 rank 6 nranks 8 cudaDev 6 nvmlDev 6 busId 70 commId 0x1900e06671419955 - Init START dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO comm 0xa6a0b1d0 rank 1 nranks 8 cudaDev 1 nvmlDev 1 busId 20 commId 0x1900e06671419955 - Init START dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO comm 0xa6c5e6a0 rank 4 nranks 8 cudaDev 4 nvmlDev 4 busId 50 commId 0x1900e06671419955 - Init START dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO bootstrapSplit: comm 0xa5e6c900 parent 0xa75046a0 rank 3 nranks 8 color -934961569 key 3 prev 2 next 4 - DONE dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO comm 0xc33c3ce0 rank 0 nranks 8 cudaDev 0 nvmlDev 0 busId 10 commId 0x1900e06671419955 - Init START dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO comm 0xa5e6c900 rank 3 nranks 8 cudaDev 3 nvmlDev 3 busId 40 commId 0x1900e06671419955 - Init START dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO bootstrapSplit: comm 0xa57b0e70 parent 0xa6df0650 rank 2 nranks 8 color -934961569 key 2 prev 1 next 3 - DONE dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO comm 0xa57b0e70 rank 2 nranks 8 cudaDev 2 nvmlDev 2 busId 30 commId 0x1900e06671419955 - Init START dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Setting affinity for GPU 1 to ffff,ffffffff,ffffffff dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO NVLS multicast support is not available on dev 1 dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO NVLS multicast support is not available on dev 7 dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO NVLS multicast support is not available on dev 6 dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO NVLS multicast support is not available on dev 4 dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Setting affinity for GPU 3 to ffff,ffffffff,ffffffff dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO NVLS multicast support is not available on dev 3 dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO NVLS multicast support is not available on dev 5 dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Setting affinity for GPU 0 to ffff,ffffffff,ffffffff dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO NVLS multicast support is not available on dev 0 dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Setting affinity for GPU 2 to ffff,ffffffff,ffffffff dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO NVLS multicast support is not available on dev 2 dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO comm 0xa6a0b1d0 rank 1 nRanks 8 nNodes 1 localRanks 8 localRank 1 MNNVL 0 dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO comm 0xc33c3ce0 rank 0 nRanks 8 nNodes 1 localRanks 8 localRank 0 MNNVL 0 dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO comm 0xa6f91730 rank 7 nRanks 8 nNodes 1 localRanks 8 localRank 7 MNNVL 0 dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO comm 0xa7df0940 rank 6 nRanks 8 nNodes 1 localRanks 8 localRank 6 MNNVL 0 dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO comm 0xa6c5e6a0 rank 4 nRanks 8 nNodes 1 localRanks 8 localRank 4 MNNVL 0 dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 00/24 : 0 1 2 3 4 5 6 7 dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO comm 0xa79ebd30 rank 5 nRanks 8 nNodes 1 localRanks 8 localRank 5 MNNVL 0 dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 01/24 : 0 1 2 3 4 5 6 7 dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 2/-1/-1->1->0 [2] 2/-1/-1->1->0 [3] 2/-1/-1->1->0 [4] 2/-1/-1->1->0 [5] 2/-1/-1->1->0 [6] 2/-1/-1->1->0 [7] 2/-1/-1->1->0 [8] 2/-1/-1->1->0 [9] 2/-1/-1->1->0 [10] 2/-1/-1->1->0 [11] 2/-1/-1->1->0 [12] 2/-1/-1->1->0 [13] 2/-1/-1->1->0 [14] 2/-1/-1->1->0 [15] 2/-1/-1->1->0 [16] 2/-1/-1->1->0 [17] 2/-1/-1->1->0 [18] 2/-1/-1->1->0 [19] 2/-1/-1->1->0 [20] 2/-1/-1->1->0 [21] 2/-1/-1->1->0 [22] 2/-1/-1->1->0 [23] 2/-1/-1->1->0 dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 02/24 : 0 1 2 3 4 5 6 7 dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO P2P Chunksize set to 524288 dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 03/24 : 0 1 2 3 4 5 6 7 dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Trees [0] -1/-1/-1->7->6 [1] -1/-1/-1->7->6 [2] -1/-1/-1->7->6 [3] -1/-1/-1->7->6 [4] -1/-1/-1->7->6 [5] -1/-1/-1->7->6 [6] -1/-1/-1->7->6 [7] -1/-1/-1->7->6 [8] -1/-1/-1->7->6 [9] -1/-1/-1->7->6 [10] -1/-1/-1->7->6 [11] -1/-1/-1->7->6 [12] -1/-1/-1->7->6 [13] -1/-1/-1->7->6 [14] -1/-1/-1->7->6 [15] -1/-1/-1->7->6 [16] -1/-1/-1->7->6 [17] -1/-1/-1->7->6 [18] -1/-1/-1->7->6 [19] -1/-1/-1->7->6 [20] -1/-1/-1->7->6 [21] -1/-1/-1->7->6 [22] -1/-1/-1->7->6 [23] -1/-1/-1->7->6 dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO comm 0xa5e6c900 rank 3 nRanks 8 nNodes 1 localRanks 8 localRank 3 MNNVL 0 dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 04/24 : 0 1 2 3 4 5 6 7 dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO P2P Chunksize set to 524288 dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO comm 0xa57b0e70 rank 2 nRanks 8 nNodes 1 localRanks 8 localRank 2 MNNVL 0 dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 05/24 : 0 1 2 3 4 5 6 7 dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Trees [0] 7/-1/-1->6->5 [1] 7/-1/-1->6->5 [2] 7/-1/-1->6->5 [3] 7/-1/-1->6->5 [4] 7/-1/-1->6->5 [5] 7/-1/-1->6->5 [6] 7/-1/-1->6->5 [7] 7/-1/-1->6->5 [8] 7/-1/-1->6->5 [9] 7/-1/-1->6->5 [10] 7/-1/-1->6->5 [11] 7/-1/-1->6->5 [12] 7/-1/-1->6->5 [13] 7/-1/-1->6->5 [14] 7/-1/-1->6->5 [15] 7/-1/-1->6->5 [16] 7/-1/-1->6->5 [17] 7/-1/-1->6->5 [18] 7/-1/-1->6->5 [19] 7/-1/-1->6->5 [20] 7/-1/-1->6->5 [21] 7/-1/-1->6->5 [22] 7/-1/-1->6->5 [23] 7/-1/-1->6->5 dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 06/24 : 0 1 2 3 4 5 6 7 dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO P2P Chunksize set to 524288 dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Trees [0] 5/-1/-1->4->3 [1] 5/-1/-1->4->3 [2] 5/-1/-1->4->3 [3] 5/-1/-1->4->3 [4] 5/-1/-1->4->3 [5] 5/-1/-1->4->3 [6] 5/-1/-1->4->3 [7] 5/-1/-1->4->3 [8] 5/-1/-1->4->3 [9] 5/-1/-1->4->3 [10] 5/-1/-1->4->3 [11] 5/-1/-1->4->3 [12] 5/-1/-1->4->3 [13] 5/-1/-1->4->3 [14] 5/-1/-1->4->3 [15] 5/-1/-1->4->3 [16] 5/-1/-1->4->3 [17] 5/-1/-1->4->3 [18] 5/-1/-1->4->3 [19] 5/-1/-1->4->3 [20] 5/-1/-1->4->3 [21] 5/-1/-1->4->3 [22] 5/-1/-1->4->3 [23] 5/-1/-1->4->3 dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO P2P Chunksize set to 524288 dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 07/24 : 0 1 2 3 4 5 6 7 dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Trees [0] 6/-1/-1->5->4 [1] 6/-1/-1->5->4 [2] 6/-1/-1->5->4 [3] 6/-1/-1->5->4 [4] 6/-1/-1->5->4 [5] 6/-1/-1->5->4 [6] 6/-1/-1->5->4 [7] 6/-1/-1->5->4 [8] 6/-1/-1->5->4 [9] 6/-1/-1->5->4 [10] 6/-1/-1->5->4 [11] 6/-1/-1->5->4 [12] 6/-1/-1->5->4 [13] 6/-1/-1->5->4 [14] 6/-1/-1->5->4 [15] 6/-1/-1->5->4 [16] 6/-1/-1->5->4 [17] 6/-1/-1->5->4 [18] 6/-1/-1->5->4 [19] 6/-1/-1->5->4 [20] 6/-1/-1->5->4 [21] 6/-1/-1->5->4 [22] 6/-1/-1->5->4 [23] 6/-1/-1->5->4 dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO P2P Chunksize set to 524288 dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 08/24 : 0 1 2 3 4 5 6 7 dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 09/24 : 0 1 2 3 4 5 6 7 dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Trees [0] 4/-1/-1->3->2 [1] 4/-1/-1->3->2 [2] 4/-1/-1->3->2 [3] 4/-1/-1->3->2 [4] 4/-1/-1->3->2 [5] 4/-1/-1->3->2 [6] 4/-1/-1->3->2 [7] 4/-1/-1->3->2 [8] 4/-1/-1->3->2 [9] 4/-1/-1->3->2 [10] 4/-1/-1->3->2 [11] 4/-1/-1->3->2 [12] 4/-1/-1->3->2 [13] 4/-1/-1->3->2 [14] 4/-1/-1->3->2 [15] 4/-1/-1->3->2 [16] 4/-1/-1->3->2 [17] 4/-1/-1->3->2 [18] 4/-1/-1->3->2 [19] 4/-1/-1->3->2 [20] 4/-1/-1->3->2 [21] 4/-1/-1->3->2 [22] 4/-1/-1->3->2 [23] 4/-1/-1->3->2 dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Trees [0] 3/-1/-1->2->1 [1] 3/-1/-1->2->1 [2] 3/-1/-1->2->1 [3] 3/-1/-1->2->1 [4] 3/-1/-1->2->1 [5] 3/-1/-1->2->1 [6] 3/-1/-1->2->1 [7] 3/-1/-1->2->1 [8] 3/-1/-1->2->1 [9] 3/-1/-1->2->1 [10] 3/-1/-1->2->1 [11] 3/-1/-1->2->1 [12] 3/-1/-1->2->1 [13] 3/-1/-1->2->1 [14] 3/-1/-1->2->1 [15] 3/-1/-1->2->1 [16] 3/-1/-1->2->1 [17] 3/-1/-1->2->1 [18] 3/-1/-1->2->1 [19] 3/-1/-1->2->1 [20] 3/-1/-1->2->1 [21] 3/-1/-1->2->1 [22] 3/-1/-1->2->1 [23] 3/-1/-1->2->1 dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO P2P Chunksize set to 524288 dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO P2P Chunksize set to 524288 dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 10/24 : 0 1 2 3 4 5 6 7 dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 11/24 : 0 1 2 3 4 5 6 7 dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 12/24 : 0 1 2 3 4 5 6 7 dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 13/24 : 0 1 2 3 4 5 6 7 dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 14/24 : 0 1 2 3 4 5 6 7 dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 15/24 : 0 1 2 3 4 5 6 7 dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 16/24 : 0 1 2 3 4 5 6 7 dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 17/24 : 0 1 2 3 4 5 6 7 dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 18/24 : 0 1 2 3 4 5 6 7 dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 19/24 : 0 1 2 3 4 5 6 7 dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 20/24 : 0 1 2 3 4 5 6 7 dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 21/24 : 0 1 2 3 4 5 6 7 dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 22/24 : 0 1 2 3 4 5 6 7 dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 23/24 : 0 1 2 3 4 5 6 7 dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 [2] 1/-1/-1->0->-1 [3] 1/-1/-1->0->-1 [4] 1/-1/-1->0->-1 [5] 1/-1/-1->0->-1 [6] 1/-1/-1->0->-1 [7] 1/-1/-1->0->-1 [8] 1/-1/-1->0->-1 [9] 1/-1/-1->0->-1 [10] 1/-1/-1->0->-1 [11] 1/-1/-1->0->-1 [12] 1/-1/-1->0->-1 [13] 1/-1/-1->0->-1 [14] 1/-1/-1->0->-1 [15] 1/-1/-1->0->-1 [16] 1/-1/-1->0->-1 [17] 1/-1/-1->0->-1 [18] 1/-1/-1->0->-1 [19] 1/-1/-1->0->-1 [20] 1/-1/-1->0->-1 [21] 1/-1/-1->0->-1 [22] 1/-1/-1->0->-1 [23] 1/-1/-1->0->-1 dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO P2P Chunksize set to 524288 dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 00/0 : 1[1] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 00/0 : 6[6] -> 7[7] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 00/0 : 7[7] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 00/0 : 3[3] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 00/0 : 4[4] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 00/0 : 2[2] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 00/0 : 5[5] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 00/0 : 0[0] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 01/0 : 1[1] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 01/0 : 6[6] -> 7[7] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 01/0 : 7[7] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 01/0 : 3[3] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 01/0 : 4[4] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 01/0 : 2[2] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 01/0 : 5[5] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 01/0 : 0[0] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 02/0 : 1[1] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 02/0 : 6[6] -> 7[7] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 02/0 : 7[7] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 02/0 : 3[3] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 02/0 : 4[4] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 02/0 : 2[2] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 02/0 : 5[5] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 02/0 : 0[0] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 03/0 : 1[1] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 03/0 : 6[6] -> 7[7] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 03/0 : 7[7] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 03/0 : 3[3] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 03/0 : 4[4] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 03/0 : 2[2] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 03/0 : 5[5] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 03/0 : 0[0] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 04/0 : 1[1] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 04/0 : 6[6] -> 7[7] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 04/0 : 7[7] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 04/0 : 3[3] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 04/0 : 4[4] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 04/0 : 2[2] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 04/0 : 5[5] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 04/0 : 0[0] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 05/0 : 1[1] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 05/0 : 6[6] -> 7[7] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 05/0 : 7[7] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 05/0 : 3[3] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 05/0 : 4[4] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 05/0 : 2[2] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 05/0 : 5[5] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 05/0 : 0[0] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 06/0 : 1[1] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 06/0 : 6[6] -> 7[7] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 06/0 : 7[7] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 06/0 : 3[3] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 06/0 : 4[4] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 06/0 : 2[2] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 06/0 : 5[5] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 06/0 : 0[0] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 07/0 : 1[1] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 07/0 : 6[6] -> 7[7] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 07/0 : 7[7] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 07/0 : 3[3] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 07/0 : 4[4] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 07/0 : 2[2] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 07/0 : 5[5] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 07/0 : 0[0] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 08/0 : 1[1] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 08/0 : 6[6] -> 7[7] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 08/0 : 7[7] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 08/0 : 3[3] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 08/0 : 4[4] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 08/0 : 2[2] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 08/0 : 5[5] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 08/0 : 0[0] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 09/0 : 1[1] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 09/0 : 6[6] -> 7[7] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 09/0 : 7[7] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 09/0 : 3[3] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 09/0 : 4[4] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 09/0 : 2[2] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 09/0 : 5[5] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 09/0 : 0[0] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 10/0 : 1[1] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 10/0 : 6[6] -> 7[7] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 10/0 : 7[7] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 10/0 : 3[3] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 10/0 : 4[4] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 10/0 : 2[2] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 10/0 : 5[5] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 10/0 : 0[0] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 11/0 : 1[1] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 11/0 : 6[6] -> 7[7] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 11/0 : 7[7] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 11/0 : 3[3] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 11/0 : 4[4] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 11/0 : 2[2] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 11/0 : 5[5] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 11/0 : 0[0] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 12/0 : 1[1] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 12/0 : 6[6] -> 7[7] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 12/0 : 7[7] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 12/0 : 3[3] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 12/0 : 4[4] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 12/0 : 2[2] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 12/0 : 5[5] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 12/0 : 0[0] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 13/0 : 1[1] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 13/0 : 6[6] -> 7[7] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 13/0 : 7[7] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 13/0 : 3[3] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 13/0 : 4[4] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 13/0 : 2[2] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 13/0 : 5[5] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 13/0 : 0[0] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 14/0 : 1[1] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 14/0 : 6[6] -> 7[7] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 14/0 : 7[7] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 14/0 : 3[3] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 14/0 : 4[4] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 14/0 : 2[2] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 14/0 : 5[5] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 14/0 : 0[0] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 15/0 : 1[1] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 15/0 : 6[6] -> 7[7] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 15/0 : 7[7] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 15/0 : 3[3] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 15/0 : 4[4] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 15/0 : 2[2] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 15/0 : 5[5] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 15/0 : 0[0] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 16/0 : 1[1] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 16/0 : 6[6] -> 7[7] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 16/0 : 7[7] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 16/0 : 3[3] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 16/0 : 4[4] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 16/0 : 2[2] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 16/0 : 5[5] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 16/0 : 0[0] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 17/0 : 1[1] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 17/0 : 6[6] -> 7[7] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 17/0 : 7[7] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 17/0 : 3[3] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 17/0 : 4[4] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 17/0 : 2[2] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 17/0 : 5[5] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 17/0 : 0[0] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 18/0 : 1[1] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 18/0 : 6[6] -> 7[7] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 18/0 : 7[7] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 18/0 : 3[3] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 18/0 : 4[4] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 18/0 : 2[2] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 18/0 : 5[5] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 18/0 : 0[0] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 19/0 : 1[1] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 19/0 : 6[6] -> 7[7] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 19/0 : 7[7] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 19/0 : 3[3] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 19/0 : 4[4] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 19/0 : 2[2] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 19/0 : 5[5] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 19/0 : 0[0] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 20/0 : 1[1] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 20/0 : 6[6] -> 7[7] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 20/0 : 7[7] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 20/0 : 3[3] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 20/0 : 4[4] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 20/0 : 2[2] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 20/0 : 5[5] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 20/0 : 0[0] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 21/0 : 1[1] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 21/0 : 6[6] -> 7[7] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 21/0 : 7[7] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 21/0 : 3[3] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 21/0 : 4[4] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 21/0 : 2[2] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 21/0 : 5[5] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 21/0 : 0[0] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 22/0 : 1[1] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 22/0 : 6[6] -> 7[7] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 22/0 : 7[7] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 22/0 : 3[3] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 22/0 : 4[4] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 22/0 : 2[2] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 22/0 : 5[5] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 22/0 : 0[0] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 23/0 : 1[1] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 23/0 : 6[6] -> 7[7] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 23/0 : 7[7] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 23/0 : 3[3] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 23/0 : 4[4] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 23/0 : 2[2] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 23/0 : 5[5] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 23/0 : 0[0] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Connected all rings dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Connected all rings dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Connected all rings dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Connected all rings dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 00/0 : 2[2] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 01/0 : 2[2] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 02/0 : 2[2] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 03/0 : 2[2] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Connected all rings dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 04/0 : 2[2] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 05/0 : 2[2] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 06/0 : 2[2] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 07/0 : 2[2] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 08/0 : 2[2] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 09/0 : 2[2] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 10/0 : 2[2] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 00/0 : 1[1] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 11/0 : 2[2] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 01/0 : 1[1] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 12/0 : 2[2] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 00/0 : 3[3] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 02/0 : 1[1] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 13/0 : 2[2] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 01/0 : 3[3] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 03/0 : 1[1] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 14/0 : 2[2] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 02/0 : 3[3] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 04/0 : 1[1] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 15/0 : 2[2] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 03/0 : 3[3] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 05/0 : 1[1] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 16/0 : 2[2] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 04/0 : 3[3] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 06/0 : 1[1] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 17/0 : 2[2] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 05/0 : 3[3] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 07/0 : 1[1] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 18/0 : 2[2] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 06/0 : 3[3] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Connected all rings dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 00/0 : 7[7] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Connected all rings dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Connected all rings dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 08/0 : 1[1] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 19/0 : 2[2] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 07/0 : 3[3] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 01/0 : 7[7] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 09/0 : 1[1] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 20/0 : 2[2] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 08/0 : 3[3] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 02/0 : 7[7] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 10/0 : 1[1] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 21/0 : 2[2] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 09/0 : 3[3] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 03/0 : 7[7] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 11/0 : 1[1] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 22/0 : 2[2] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 10/0 : 3[3] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 04/0 : 7[7] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 12/0 : 1[1] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 23/0 : 2[2] -> 1[1] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 11/0 : 3[3] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 05/0 : 7[7] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 13/0 : 1[1] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 12/0 : 3[3] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 06/0 : 7[7] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 14/0 : 1[1] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 13/0 : 3[3] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 07/0 : 7[7] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 15/0 : 1[1] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 14/0 : 3[3] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 08/0 : 7[7] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 16/0 : 1[1] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 15/0 : 3[3] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 09/0 : 7[7] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 17/0 : 1[1] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 16/0 : 3[3] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 10/0 : 7[7] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 18/0 : 1[1] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 00/0 : 4[4] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 17/0 : 3[3] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 11/0 : 7[7] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 19/0 : 1[1] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 01/0 : 4[4] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 18/0 : 3[3] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 12/0 : 7[7] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 20/0 : 1[1] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 02/0 : 4[4] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 19/0 : 3[3] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 13/0 : 7[7] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 21/0 : 1[1] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 03/0 : 4[4] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 20/0 : 3[3] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 14/0 : 7[7] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 22/0 : 1[1] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 04/0 : 4[4] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 21/0 : 3[3] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 15/0 : 7[7] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 23/0 : 1[1] -> 0[0] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 05/0 : 4[4] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 22/0 : 3[3] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 16/0 : 7[7] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 06/0 : 4[4] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 23/0 : 3[3] -> 2[2] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 17/0 : 7[7] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 07/0 : 4[4] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 18/0 : 7[7] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 08/0 : 4[4] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 19/0 : 7[7] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 09/0 : 4[4] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 20/0 : 7[7] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 10/0 : 4[4] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 21/0 : 7[7] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 11/0 : 4[4] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 22/0 : 7[7] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 12/0 : 4[4] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 23/0 : 7[7] -> 6[6] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 13/0 : 4[4] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 00/0 : 6[6] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 14/0 : 4[4] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 00/0 : 5[5] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 01/0 : 6[6] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 15/0 : 4[4] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 01/0 : 5[5] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 02/0 : 6[6] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 16/0 : 4[4] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 02/0 : 5[5] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 03/0 : 6[6] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 17/0 : 4[4] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 03/0 : 5[5] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 04/0 : 6[6] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 18/0 : 4[4] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 04/0 : 5[5] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 05/0 : 6[6] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 19/0 : 4[4] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 05/0 : 5[5] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 06/0 : 6[6] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 20/0 : 4[4] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 06/0 : 5[5] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 07/0 : 6[6] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 21/0 : 4[4] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 07/0 : 5[5] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 08/0 : 6[6] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 22/0 : 4[4] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 08/0 : 5[5] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 09/0 : 6[6] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 23/0 : 4[4] -> 3[3] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 09/0 : 5[5] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 10/0 : 6[6] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 10/0 : 5[5] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 11/0 : 6[6] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 11/0 : 5[5] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 12/0 : 6[6] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 12/0 : 5[5] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 13/0 : 6[6] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 13/0 : 5[5] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 14/0 : 6[6] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 14/0 : 5[5] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 15/0 : 6[6] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 15/0 : 5[5] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 16/0 : 6[6] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 16/0 : 5[5] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 17/0 : 6[6] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 17/0 : 5[5] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 18/0 : 6[6] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 18/0 : 5[5] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 19/0 : 6[6] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 19/0 : 5[5] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 20/0 : 6[6] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 20/0 : 5[5] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 21/0 : 6[6] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 21/0 : 5[5] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 22/0 : 6[6] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 22/0 : 5[5] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 23/0 : 6[6] -> 5[5] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 23/0 : 5[5] -> 4[4] via P2P/CUMEM/read dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Connected all trees dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO 24 coll channels, 0 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Connected all trees dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO 24 coll channels, 0 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Connected all trees dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO 24 coll channels, 0 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Connected all trees dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO 24 coll channels, 0 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Connected all trees dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO 24 coll channels, 0 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Connected all trees dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO 24 coll channels, 0 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Connected all trees dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Connected all trees dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO 24 coll channels, 0 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO 24 coll channels, 0 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO comm 0xa57b0e70 rank 2 nranks 8 cudaDev 2 nvmlDev 2 busId 30 commId 0x1900e06671419955 - Init COMPLETE dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO comm 0xa6f91730 rank 7 nranks 8 cudaDev 7 nvmlDev 7 busId 80 commId 0x1900e06671419955 - Init COMPLETE dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO comm 0xa7df0940 rank 6 nranks 8 cudaDev 6 nvmlDev 6 busId 70 commId 0x1900e06671419955 - Init COMPLETE dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO comm 0xa79ebd30 rank 5 nranks 8 cudaDev 5 nvmlDev 5 busId 60 commId 0x1900e06671419955 - Init COMPLETE dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO comm 0xa6c5e6a0 rank 4 nranks 8 cudaDev 4 nvmlDev 4 busId 50 commId 0x1900e06671419955 - Init COMPLETE dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO comm 0xa5e6c900 rank 3 nranks 8 cudaDev 3 nvmlDev 3 busId 40 commId 0x1900e06671419955 - Init COMPLETE dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO comm 0xc33c3ce0 rank 0 nranks 8 cudaDev 0 nvmlDev 0 busId 10 commId 0x1900e06671419955 - Init COMPLETE dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO comm 0xa6a0b1d0 rank 1 nranks 8 cudaDev 1 nvmlDev 1 busId 20 commId 0x1900e06671419955 - Init COMPLETE [2024-12-05 13:42:17,406] [INFO] [logging.py:128:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False Using /root/.cache/torch_extensions/py311_cu121 as PyTorch extensions root...Using /root/.cache/torch_extensions/py311_cu121 as PyTorch extensions root...Using /root/.cache/torch_extensions/py311_cu121 as PyTorch extensions root...Using /root/.cache/torch_extensions/py311_cu121 as PyTorch extensions root... Using /root/.cache/torch_extensions/py311_cu121 as PyTorch extensions root... Creating extension directory /root/.cache/torch_extensions/py311_cu121/fused_adam...Creating extension directory /root/.cache/torch_extensions/py311_cu121/fused_adam...Creating extension directory /root/.cache/torch_extensions/py311_cu121/fused_adam... Creating extension directory /root/.cache/torch_extensions/py311_cu121/fused_adam... Creating extension directory /root/.cache/torch_extensions/py311_cu121/fused_adam... Using /root/.cache/torch_extensions/py311_cu121 as PyTorch extensions root...Using /root/.cache/torch_extensions/py311_cu121 as PyTorch extensions root... Creating extension directory /root/.cache/torch_extensions/py311_cu121/fused_adam...Creating extension directory /root/.cache/torch_extensions/py311_cu121/fused_adam... Using /root/.cache/torch_extensions/py311_cu121 as PyTorch extensions root... Creating extension directory /root/.cache/torch_extensions/py311_cu121/fused_adam... Detected CUDA files, patching ldflags Emitting ninja build file /root/.cache/torch_extensions/py311_cu121/fused_adam/build.ninja... /cpfs01/user/wuzhenyu/anaconda3/envs/llama-factory/lib/python3.11/site-packages/torch/utils/cpp_extension.py:1965: UserWarning: TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST']. warnings.warn( Building extension module fused_adam... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [1/3] c++ -MMD -MF fused_adam_frontend.o.d -DTORCH_EXTENSION_NAME=fused_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/cpfs01/user/wuzhenyu/anaconda3/envs/llama-factory/lib/python3.11/site-packages/deepspeed/ops/csrc/includes -I/cpfs01/user/wuzhenyu/anaconda3/envs/llama-factory/lib/python3.11/site-packages/deepspeed/ops/csrc/adam -isystem /cpfs01/user/wuzhenyu/anaconda3/envs/llama-factory/lib/python3.11/site-packages/torch/include -isystem /cpfs01/user/wuzhenyu/anaconda3/envs/llama-factory/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /cpfs01/user/wuzhenyu/anaconda3/envs/llama-factory/lib/python3.11/site-packages/torch/include/TH -isystem /cpfs01/user/wuzhenyu/anaconda3/envs/llama-factory/lib/python3.11/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /cpfs01/user/wuzhenyu/anaconda3/envs/llama-factory/include/python3.11 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -DBF16_AVAILABLE -c /cpfs01/user/wuzhenyu/anaconda3/envs/llama-factory/lib/python3.11/site-packages/deepspeed/ops/csrc/adam/fused_adam_frontend.cpp -o fused_adam_frontend.o [2/3] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output multi_tensor_adam.cuda.o.d -DTORCH_EXTENSION_NAME=fused_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/cpfs01/user/wuzhenyu/anaconda3/envs/llama-factory/lib/python3.11/site-packages/deepspeed/ops/csrc/includes -I/cpfs01/user/wuzhenyu/anaconda3/envs/llama-factory/lib/python3.11/site-packages/deepspeed/ops/csrc/adam -isystem /cpfs01/user/wuzhenyu/anaconda3/envs/llama-factory/lib/python3.11/site-packages/torch/include -isystem /cpfs01/user/wuzhenyu/anaconda3/envs/llama-factory/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /cpfs01/user/wuzhenyu/anaconda3/envs/llama-factory/lib/python3.11/site-packages/torch/include/TH -isystem /cpfs01/user/wuzhenyu/anaconda3/envs/llama-factory/lib/python3.11/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /cpfs01/user/wuzhenyu/anaconda3/envs/llama-factory/include/python3.11 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 --compiler-options '-fPIC' -O3 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -lineinfo --use_fast_math -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -DBF16_AVAILABLE -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -std=c++17 -c /cpfs01/user/wuzhenyu/anaconda3/envs/llama-factory/lib/python3.11/site-packages/deepspeed/ops/csrc/adam/multi_tensor_adam.cu -o multi_tensor_adam.cuda.o [3/3] c++ fused_adam_frontend.o multi_tensor_adam.cuda.o -shared -L/cpfs01/user/wuzhenyu/anaconda3/envs/llama-factory/lib/python3.11/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -ltorch_python -L/usr/local/cuda/lib64 -lcudart -o fused_adam.so Loading extension module fused_adam... Time to load fused_adam op: 34.222880601882935 seconds [2024-12-05 13:42:51,649] [INFO] [logging.py:128:log_dist] [Rank 0] Using DeepSpeed Optimizer param name adamw as basic optimizer [2024-12-05 13:42:51,649] [INFO] [logging.py:128:log_dist] [Rank 0] Removing param_group that has no 'params' in the basic Optimizer Loading extension module fused_adam...Loading extension module fused_adam... Loading extension module fused_adam...Loading extension module fused_adam... Time to load fused_adam op: 34.243221044540405 secondsTime to load fused_adam op: 34.24309253692627 seconds Loading extension module fused_adam... Time to load fused_adam op: 34.24324321746826 seconds Time to load fused_adam op: 34.243529319763184 seconds Loading extension module fused_adam... Time to load fused_adam op: 34.243879318237305 seconds Time to load fused_adam op: 34.244356632232666 seconds [2024-12-05 13:42:51,666] [INFO] [logging.py:128:log_dist] [Rank 0] DeepSpeed Basic Optimizer = FusedAdam [2024-12-05 13:42:51,666] [INFO] [utils.py:59:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type= [2024-12-05 13:42:51,666] [INFO] [logging.py:128:log_dist] [Rank 0] Creating torch.bfloat16 ZeRO stage 1 optimizer [2024-12-05 13:42:51,666] [INFO] [stage_1_and_2.py:149:__init__] Reduce bucket size 1000000000 [2024-12-05 13:42:51,666] [INFO] [stage_1_and_2.py:150:__init__] Allgather bucket size 1000000000 [2024-12-05 13:42:51,666] [INFO] [stage_1_and_2.py:151:__init__] CPU Offload: False [2024-12-05 13:42:51,666] [INFO] [stage_1_and_2.py:152:__init__] Round robin gradient partitioning: False Loading extension module fused_adam... Time to load fused_adam op: 34.29689931869507 seconds [2024-12-05 13:43:03,153] [INFO] [utils.py:781:see_memory_usage] Before initializing optimizer states [2024-12-05 13:43:03,154] [INFO] [utils.py:782:see_memory_usage] MA 18.99 GB Max_MA 20.77 GB CA 20.9 GB Max_CA 21 GB [2024-12-05 13:43:03,154] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory: used = 133.14 GB, percent = 13.3% [2024-12-05 13:43:03,464] [INFO] [utils.py:781:see_memory_usage] After initializing optimizer states [2024-12-05 13:43:03,465] [INFO] [utils.py:782:see_memory_usage] MA 18.99 GB Max_MA 22.54 GB CA 24.44 GB Max_CA 24 GB [2024-12-05 13:43:03,465] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory: used = 133.14 GB, percent = 13.3% [2024-12-05 13:43:03,465] [INFO] [stage_1_and_2.py:544:__init__] optimizer state initialized [2024-12-05 13:43:03,760] [INFO] [utils.py:781:see_memory_usage] After initializing ZeRO optimizer [2024-12-05 13:43:03,761] [INFO] [utils.py:782:see_memory_usage] MA 18.99 GB Max_MA 18.99 GB CA 24.44 GB Max_CA 24 GB [2024-12-05 13:43:03,761] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory: used = 118.96 GB, percent = 11.9% [2024-12-05 13:43:03,763] [INFO] [logging.py:128:log_dist] [Rank 0] DeepSpeed Final Optimizer = DeepSpeedZeroOptimizer [2024-12-05 13:43:03,763] [INFO] [logging.py:128:log_dist] [Rank 0] DeepSpeed using client callable to create LR scheduler [2024-12-05 13:43:03,763] [INFO] [logging.py:128:log_dist] [Rank 0] DeepSpeed LR Scheduler = [2024-12-05 13:43:03,763] [INFO] [logging.py:128:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0], mom=[[0.9, 0.999]] [2024-12-05 13:43:03,765] [INFO] [config.py:999:print] DeepSpeedEngine configuration: [2024-12-05 13:43:03,765] [INFO] [config.py:1003:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2024-12-05 13:43:03,765] [INFO] [config.py:1003:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True, 'use_gds': False} [2024-12-05 13:43:03,765] [INFO] [config.py:1003:print] amp_enabled .................. False [2024-12-05 13:43:03,765] [INFO] [config.py:1003:print] amp_params ................... False [2024-12-05 13:43:03,766] [INFO] [config.py:1003:print] autotuning_config ............ { "enabled": false, "start_step": null, "end_step": null, "metric_path": null, "arg_mappings": null, "metric": "throughput", "model_info": null, "results_dir": "autotuning_results", "exps_dir": "autotuning_exps", "overwrite": true, "fast": true, "start_profile_step": 3, "end_profile_step": 5, "tuner_type": "gridsearch", "tuner_early_stopping": 5, "tuner_num_trials": 50, "model_info_path": null, "mp_size": 1, "max_train_batch_size": null, "min_train_batch_size": 1, "max_train_micro_batch_size_per_gpu": 1.024000e+03, "min_train_micro_batch_size_per_gpu": 1, "num_tuning_micro_batch_sizes": 3 } [2024-12-05 13:43:03,766] [INFO] [config.py:1003:print] bfloat16_enabled ............. True [2024-12-05 13:43:03,766] [INFO] [config.py:1003:print] bfloat16_immediate_grad_update False [2024-12-05 13:43:03,766] [INFO] [config.py:1003:print] checkpoint_parallel_write_pipeline False [2024-12-05 13:43:03,766] [INFO] [config.py:1003:print] checkpoint_tag_validation_enabled True [2024-12-05 13:43:03,766] [INFO] [config.py:1003:print] checkpoint_tag_validation_fail False [2024-12-05 13:43:03,766] [INFO] [config.py:1003:print] comms_config ................. [2024-12-05 13:43:03,766] [INFO] [config.py:1003:print] communication_data_type ...... None [2024-12-05 13:43:03,766] [INFO] [config.py:1003:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} [2024-12-05 13:43:03,766] [INFO] [config.py:1003:print] curriculum_enabled_legacy .... False [2024-12-05 13:43:03,766] [INFO] [config.py:1003:print] curriculum_params_legacy ..... False [2024-12-05 13:43:03,766] [INFO] [config.py:1003:print] data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}} [2024-12-05 13:43:03,766] [INFO] [config.py:1003:print] data_efficiency_enabled ...... False [2024-12-05 13:43:03,766] [INFO] [config.py:1003:print] dataloader_drop_last ......... False [2024-12-05 13:43:03,766] [INFO] [config.py:1003:print] disable_allgather ............ False [2024-12-05 13:43:03,766] [INFO] [config.py:1003:print] dump_state ................... False [2024-12-05 13:43:03,766] [INFO] [config.py:1003:print] dynamic_loss_scale_args ...... None [2024-12-05 13:43:03,766] [INFO] [config.py:1003:print] eigenvalue_enabled ........... False [2024-12-05 13:43:03,766] [INFO] [config.py:1003:print] eigenvalue_gas_boundary_resolution 1 [2024-12-05 13:43:03,766] [INFO] [config.py:1003:print] eigenvalue_layer_name ........ bert.encoder.layer [2024-12-05 13:43:03,766] [INFO] [config.py:1003:print] eigenvalue_layer_num ......... 0 [2024-12-05 13:43:03,766] [INFO] [config.py:1003:print] eigenvalue_max_iter .......... 100 [2024-12-05 13:43:03,766] [INFO] [config.py:1003:print] eigenvalue_stability ......... 1e-06 [2024-12-05 13:43:03,766] [INFO] [config.py:1003:print] eigenvalue_tol ............... 0.01 [2024-12-05 13:43:03,766] [INFO] [config.py:1003:print] eigenvalue_verbose ........... False [2024-12-05 13:43:03,766] [INFO] [config.py:1003:print] elasticity_enabled ........... False [2024-12-05 13:43:03,766] [INFO] [config.py:1003:print] flops_profiler_config ........ { "enabled": false, "recompute_fwd_factor": 0.0, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2024-12-05 13:43:03,766] [INFO] [config.py:1003:print] fp16_auto_cast ............... None [2024-12-05 13:43:03,766] [INFO] [config.py:1003:print] fp16_enabled ................. False [2024-12-05 13:43:03,766] [INFO] [config.py:1003:print] fp16_master_weights_and_gradients False [2024-12-05 13:43:03,766] [INFO] [config.py:1003:print] global_rank .................. 0 [2024-12-05 13:43:03,766] [INFO] [config.py:1003:print] grad_accum_dtype ............. None [2024-12-05 13:43:03,766] [INFO] [config.py:1003:print] gradient_accumulation_steps .. 16 [2024-12-05 13:43:03,766] [INFO] [config.py:1003:print] gradient_clipping ............ 1.0 [2024-12-05 13:43:03,767] [INFO] [config.py:1003:print] gradient_predivide_factor .... 1.0 [2024-12-05 13:43:03,767] [INFO] [config.py:1003:print] graph_harvesting ............. False [2024-12-05 13:43:03,767] [INFO] [config.py:1003:print] hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8 [2024-12-05 13:43:03,767] [INFO] [config.py:1003:print] initial_dynamic_scale ........ 1 [2024-12-05 13:43:03,767] [INFO] [config.py:1003:print] load_universal_checkpoint .... False [2024-12-05 13:43:03,767] [INFO] [config.py:1003:print] loss_scale ................... 1.0 [2024-12-05 13:43:03,767] [INFO] [config.py:1003:print] memory_breakdown ............. False [2024-12-05 13:43:03,767] [INFO] [config.py:1003:print] mics_hierarchial_params_gather False [2024-12-05 13:43:03,767] [INFO] [config.py:1003:print] mics_shard_size .............. -1 [2024-12-05 13:43:03,767] [INFO] [config.py:1003:print] monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') comet=CometConfig(enabled=False, samples_log_interval=100, project=None, workspace=None, api_key=None, experiment_name=None, experiment_key=None, online=None, mode=None) wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') [2024-12-05 13:43:03,767] [INFO] [config.py:1003:print] nebula_config ................ { "enabled": false, "persistent_storage_path": null, "persistent_time_interval": 100, "num_of_version_in_retention": 2, "enable_nebula_load": true, "load_path": null } [2024-12-05 13:43:03,767] [INFO] [config.py:1003:print] optimizer_legacy_fusion ...... False [2024-12-05 13:43:03,767] [INFO] [config.py:1003:print] optimizer_name ............... adamw [2024-12-05 13:43:03,767] [INFO] [config.py:1003:print] optimizer_params ............. {'lr': 1e-06, 'betas': [0.9, 0.999], 'eps': 1e-08, 'weight_decay': 0.001} [2024-12-05 13:43:03,767] [INFO] [config.py:1003:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0, 'pipe_partitioned': True, 'grad_partitioned': True} [2024-12-05 13:43:03,767] [INFO] [config.py:1003:print] pld_enabled .................. False [2024-12-05 13:43:03,767] [INFO] [config.py:1003:print] pld_params ................... False [2024-12-05 13:43:03,767] [INFO] [config.py:1003:print] prescale_gradients ........... False [2024-12-05 13:43:03,767] [INFO] [config.py:1003:print] scheduler_name ............... None [2024-12-05 13:43:03,767] [INFO] [config.py:1003:print] scheduler_params ............. None [2024-12-05 13:43:03,767] [INFO] [config.py:1003:print] seq_parallel_communication_data_type torch.float32 [2024-12-05 13:43:03,767] [INFO] [config.py:1003:print] sparse_attention ............. None [2024-12-05 13:43:03,767] [INFO] [config.py:1003:print] sparse_gradients_enabled ..... False [2024-12-05 13:43:03,767] [INFO] [config.py:1003:print] steps_per_print .............. inf [2024-12-05 13:43:03,767] [INFO] [config.py:1003:print] timers_config ................ enabled=True synchronized=True [2024-12-05 13:43:03,767] [INFO] [config.py:1003:print] train_batch_size ............. 128 [2024-12-05 13:43:03,767] [INFO] [config.py:1003:print] train_micro_batch_size_per_gpu 1 [2024-12-05 13:43:03,767] [INFO] [config.py:1003:print] use_data_before_expert_parallel_ False [2024-12-05 13:43:03,767] [INFO] [config.py:1003:print] use_node_local_storage ....... False [2024-12-05 13:43:03,767] [INFO] [config.py:1003:print] wall_clock_breakdown ......... True [2024-12-05 13:43:03,767] [INFO] [config.py:1003:print] weight_quantization_config ... None [2024-12-05 13:43:03,767] [INFO] [config.py:1003:print] world_size ................... 8 [2024-12-05 13:43:03,767] [INFO] [config.py:1003:print] zero_allow_untested_optimizer False [2024-12-05 13:43:03,767] [INFO] [config.py:1003:print] zero_config .................. stage=1 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=1000000000 use_multi_rank_bucket_allreduce=True allgather_partitions=True allgather_bucket_size=1000000000 overlap_comm=True load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1000000000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50000000 param_persistence_threshold=100000 model_persistence_threshold=9223372036854775807 max_live_parameters=1000000000 max_reuse_distance=1000000000 gather_16bit_weights_on_model_save=False use_all_reduce_for_fetch_params=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False zero_hpz_partition_size=1 zero_quantized_weights=False zero_quantized_nontrainable_weights=False zero_quantized_gradients=False mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=True pipeline_loading_checkpoint=False override_module_apply=True [2024-12-05 13:43:03,767] [INFO] [config.py:1003:print] zero_enabled ................. True [2024-12-05 13:43:03,767] [INFO] [config.py:1003:print] zero_force_ds_cpu_optimizer .. True [2024-12-05 13:43:03,767] [INFO] [config.py:1003:print] zero_optimization_stage ...... 1 [2024-12-05 13:43:03,768] [INFO] [config.py:989:print_user_config] json = { "zero_optimization": { "stage": 1, "allgather_partitions": true, "allgather_bucket_size": 1.000000e+09, "overlap_comm": true, "reduce_scatter": true, "reduce_bucket_size": 1.000000e+09, "contiguous_gradients": true }, "fp16": { "enabled": false, "auto_cast": true, "loss_scale": 0, "initial_scale_power": 32, "loss_scale_window": 1000, "hysteresis": 2, "min_loss_scale": 1 }, "bf16": { "enabled": true }, "optimizer": { "type": "AdamW", "params": { "lr": 1e-06, "betas": [0.9, 0.999], "eps": 1e-08, "weight_decay": 0.001 } }, "gradient_accumulation_steps": 16, "gradient_clipping": 1.0, "steps_per_print": inf, "train_batch_size": 128, "train_micro_batch_size_per_gpu": 1, "wall_clock_breakdown": true } [INFO|trainer.py:2243] 2024-12-05 13:43:03,768 >> ***** Running training ***** [INFO|trainer.py:2244] 2024-12-05 13:43:03,768 >> Num examples = 24,618 [INFO|trainer.py:2245] 2024-12-05 13:43:03,768 >> Num Epochs = 2 [INFO|trainer.py:2246] 2024-12-05 13:43:03,768 >> Instantaneous batch size per device = 1 [INFO|trainer.py:2249] 2024-12-05 13:43:03,768 >> Total train batch size (w. parallel, distributed & accumulation) = 128 [INFO|trainer.py:2250] 2024-12-05 13:43:03,768 >> Gradient Accumulation steps = 16 [INFO|trainer.py:2251] 2024-12-05 13:43:03,768 >> Total optimization steps = 384 [INFO|trainer.py:2252] 2024-12-05 13:43:03,770 >> Number of trainable parameters = 7,615,616,512 0%| | 0/384 [00:00> Saving model checkpoint to /nas/shared/NLP_A100/wuzhenyu/ckpt/qwen2vl_7b_sim_24k_1025/checkpoint-384 [INFO|configuration_utils.py:407] 2024-12-05 15:32:31,937 >> Configuration saved in /nas/shared/NLP_A100/wuzhenyu/ckpt/qwen2vl_7b_sim_24k_1025/checkpoint-384/config.json [INFO|configuration_utils.py:868] 2024-12-05 15:32:31,972 >> Configuration saved in /nas/shared/NLP_A100/wuzhenyu/ckpt/qwen2vl_7b_sim_24k_1025/checkpoint-384/generation_config.json [INFO|modeling_utils.py:2838] 2024-12-05 15:33:11,815 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 4 checkpoint shards. You can find where each parameters has been saved in the index located at /nas/shared/NLP_A100/wuzhenyu/ckpt/qwen2vl_7b_sim_24k_1025/checkpoint-384/model.safetensors.index.json. [INFO|tokenization_utils_base.py:2649] 2024-12-05 15:33:11,856 >> tokenizer config file saved in /nas/shared/NLP_A100/wuzhenyu/ckpt/qwen2vl_7b_sim_24k_1025/checkpoint-384/tokenizer_config.json [INFO|tokenization_utils_base.py:2658] 2024-12-05 15:33:11,895 >> Special tokens file saved in /nas/shared/NLP_A100/wuzhenyu/ckpt/qwen2vl_7b_sim_24k_1025/checkpoint-384/special_tokens_map.json [2024-12-05 15:33:12,401] [INFO] [logging.py:128:log_dist] [Rank 0] [Torch] Checkpoint global_step384 is about to be saved! [2024-12-05 15:33:12,431] [INFO] [logging.py:128:log_dist] [Rank 0] Saving model checkpoint: /nas/shared/NLP_A100/wuzhenyu/ckpt/qwen2vl_7b_sim_24k_1025/checkpoint-384/global_step384/mp_rank_00_model_states.pt [2024-12-05 15:33:12,431] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving /nas/shared/NLP_A100/wuzhenyu/ckpt/qwen2vl_7b_sim_24k_1025/checkpoint-384/global_step384/mp_rank_00_model_states.pt... [2024-12-05 15:33:52,818] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved /nas/shared/NLP_A100/wuzhenyu/ckpt/qwen2vl_7b_sim_24k_1025/checkpoint-384/global_step384/mp_rank_00_model_states.pt. [2024-12-05 15:33:53,188] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving /nas/shared/NLP_A100/wuzhenyu/ckpt/qwen2vl_7b_sim_24k_1025/checkpoint-384/global_step384/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... [2024-12-05 15:36:48,940] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved /nas/shared/NLP_A100/wuzhenyu/ckpt/qwen2vl_7b_sim_24k_1025/checkpoint-384/global_step384/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. [2024-12-05 15:36:49,848] [INFO] [engine.py:3536:_save_zero_checkpoint] zero checkpoint saved /nas/shared/NLP_A100/wuzhenyu/ckpt/qwen2vl_7b_sim_24k_1025/checkpoint-384/global_step384/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt [2024-12-05 15:36:49,848] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step384 is ready now! [INFO|trainer.py:2505] 2024-12-05 15:36:52,089 >> Training completed. Do not forget to share your model on huggingface.co/models =) {'train_runtime': 6828.3197, 'train_samples_per_second': 7.211, 'train_steps_per_second': 0.056, 'train_loss': 0.19105235013800362, 'epoch': 2.0} 100%|██████████| 384/384 [1:53:48<00:00, 15.56s/it] 100%|██████████| 384/384 [1:53:48<00:00, 17.78s/it] [INFO|image_processing_base.py:258] 2024-12-05 15:36:52,231 >> Image processor saved in /nas/shared/NLP_A100/wuzhenyu/ckpt/qwen2vl_7b_sim_24k_1025/preprocessor_config.json [INFO|trainer.py:3705] 2024-12-05 15:36:57,061 >> Saving model checkpoint to /nas/shared/NLP_A100/wuzhenyu/ckpt/qwen2vl_7b_sim_24k_1025 [INFO|configuration_utils.py:407] 2024-12-05 15:36:57,118 >> Configuration saved in /nas/shared/NLP_A100/wuzhenyu/ckpt/qwen2vl_7b_sim_24k_1025/config.json [INFO|configuration_utils.py:868] 2024-12-05 15:36:57,153 >> Configuration saved in /nas/shared/NLP_A100/wuzhenyu/ckpt/qwen2vl_7b_sim_24k_1025/generation_config.json [INFO|modeling_utils.py:2838] 2024-12-05 15:37:37,945 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 4 checkpoint shards. You can find where each parameters has been saved in the index located at /nas/shared/NLP_A100/wuzhenyu/ckpt/qwen2vl_7b_sim_24k_1025/model.safetensors.index.json. [INFO|tokenization_utils_base.py:2649] 2024-12-05 15:37:38,009 >> tokenizer config file saved in /nas/shared/NLP_A100/wuzhenyu/ckpt/qwen2vl_7b_sim_24k_1025/tokenizer_config.json [INFO|tokenization_utils_base.py:2658] 2024-12-05 15:37:38,079 >> Special tokens file saved in /nas/shared/NLP_A100/wuzhenyu/ckpt/qwen2vl_7b_sim_24k_1025/special_tokens_map.json ***** train metrics ***** epoch = 1.9961 total_flos = 2845225360GF train_loss = 0.1911 train_runtime = 1:53:48.31 train_samples_per_second = 7.211 train_steps_per_second = 0.056 Figure saved at: /nas/shared/NLP_A100/wuzhenyu/ckpt/qwen2vl_7b_sim_24k_1025/training_loss.png 12/05/2024 15:37:39 - WARNING - llamafactory.extras.ploting - No metric eval_loss to plot. 12/05/2024 15:37:39 - WARNING - llamafactory.extras.ploting - No metric eval_accuracy to plot. [INFO|modelcard.py:449] 2024-12-05 15:37:39,381 >> Dropping the following result as it does not have all the necessary fields: {'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}