Spaces:
Runtime error
Runtime error
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
To disable this warning, you can either: | |
- Avoid using `tokenizers` before the fork if possible | |
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
To disable this warning, you can either: | |
- Avoid using `tokenizers` before the fork if possible | |
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
To disable this warning, you can either: | |
- Avoid using `tokenizers` before the fork if possible | |
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
[34m[1mwandb[39m[22m: [33mWARNING[39m Serializing object of type dict that is 589920 bytes | |
[34m[1mwandb[39m[22m: [33mWARNING[39m Serializing object of type dict that is 589920 bytes | |
0%| | 0/70340 [00:00 , ?it/s] | |
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
To disable this warning, you can either: | |
- Avoid using `tokenizers` before the fork if possible | |
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
To disable this warning, you can either: | |
- Avoid using `tokenizers` before the fork if possible | |
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
To disable this warning, you can either: | |
- Avoid using `tokenizers` before the fork if possible | |
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
To disable this warning, you can either: | |
- Avoid using `tokenizers` before the fork if possible | |
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
To disable this warning, you can either: | |
- Avoid using `tokenizers` before the fork if possible | |
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
To disable this warning, you can either: | |
- Avoid using `tokenizers` before the fork if possible | |
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
To disable this warning, you can either: | |
- Avoid using `tokenizers` before the fork if possible | |
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
To disable this warning, you can either: | |
- Avoid using `tokenizers` before the fork if possible | |
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
To disable this warning, you can either: | |
- Avoid using `tokenizers` before the fork if possible | |
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
To disable this warning, you can either: | |
- Avoid using `tokenizers` before the fork if possible | |
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
To disable this warning, you can either: | |
- Avoid using `tokenizers` before the fork if possible | |
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
To disable this warning, you can either: | |
- Avoid using `tokenizers` before the fork if possible | |
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
To disable this warning, you can either: | |
- Avoid using `tokenizers` before the fork if possible | |
0%| | 0/70340 [00:00 , ?it/s]/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/data/data_collator.py:132: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:210.) | |
batch[k] = torch.tensor([f[k] for f in features]) | |
/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/data/data_collator.py:132: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:210.) | |
batch[k] = torch.tensor([f[k] for f in features]) | |
/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/data/data_collator.py:132: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:210.) | |
batch[k] = torch.tensor([f[k] for f in features]) | |
/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/data/data_collator.py:132: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:210.) | |
batch[k] = torch.tensor([f[k] for f in features]) | |
/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/data/data_collator.py:132: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:210.) | |
batch[k] = torch.tensor([f[k] for f in features]) | |
/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/data/data_collator.py:132: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:210.) | |
batch[k] = torch.tensor([f[k] for f in features]) | |
/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/data/data_collator.py:132: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:210.) | |
batch[k] = torch.tensor([f[k] for f in features]) | |
/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/data/data_collator.py:132: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:210.) | |
batch[k] = torch.tensor([f[k] for f in features]) | |
/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/data/data_collator.py:132: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:210.) | |
batch[k] = torch.tensor([f[k] for f in features]) | |
/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/data/data_collator.py:132: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:210.) | |
batch[k] = torch.tensor([f[k] for f in features]) | |
/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/data/data_collator.py:132: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:210.) | |
batch[k] = torch.tensor([f[k] for f in features]) | |
/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/data/data_collator.py:132: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:210.) | |
batch[k] = torch.tensor([f[k] for f in features]) | |
/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/data/data_collator.py:132: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:210.) | |
batch[k] = torch.tensor([f[k] for f in features]) | |
/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/data/data_collator.py:132: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:210.) | |
batch[k] = torch.tensor([f[k] for f in features]) | |
/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/data/data_collator.py:132: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:210.) | |
batch[k] = torch.tensor([f[k] for f in features]) | |
/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/data/data_collator.py:132: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:210.) | |
batch[k] = torch.tensor([f[k] for f in features]) | |
Traceback (most recent call last): | |
File "/n/fs/nlp-pranjal/SemSup-LMLC/cleaned_code/main.py", line 598, in <module> | |
File "/n/fs/nlp-pranjal/SemSup-LMLC/cleaned_code/main.py", line 513, in main | |
data_args.max_train_samples if data_args.max_train_samples is not None else len(train_dataset) | |
File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/trainer.py", line 1409, in train | |
return inner_training_loop( | |
File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/trainer.py", line 1651, in _inner_training_loop | |
tr_loss_step = self.training_step(model, inputs) | |
File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/trainer.py", line 2349, in training_step | |
loss = self.compute_loss(model, inputs) | |
File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/trainer.py", line 2381, in compute_loss | |
outputs = model(**inputs) | |
File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl | |
return forward_call(*input, **kwargs) | |
File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/parallel/data_parallel.py", line 168, in forward | |
outputs = self.parallel_apply(replicas, inputs, kwargs) | |
File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/parallel/data_parallel.py", line 178, in parallel_apply | |
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) | |
File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply | |
output.reraise() | |
File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/_utils.py", line 457, in reraise | |
raise exception | |
RuntimeError: Caught RuntimeError in replica 0 on device 0. | |
Original Traceback (most recent call last): | |
File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker | |
output = module(*input, **kwargs) | |
File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl | |
return forward_call(*input, **kwargs) | |
File "/n/fs/nlp-pranjal/SemSup-LMLC/cleaned_code/src/models.py", line 459, in forward | |
File "/n/fs/nlp-pranjal/SemSup-LMLC/cleaned_code/src/models.py", line 247, in coil_forward | |
lab_reps = self.tok_proj(outputs_lab.last_hidden_state @ self.label_projection.weight) # Q * LQ * d | |
File "/n/fs/nlp-pranjal/SemSup-LMLC/cleaned_code/src/models.py", line 399, in forward_label_embeddings | |
desc_attention_mask: Optional[List[int]] = None, | |
File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl | |
return forward_call(*input, **kwargs) | |
File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/models/bert/modeling_bert.py", line 1018, in forward | |
encoder_outputs = self.encoder( | |
File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl | |
return forward_call(*input, **kwargs) | |
File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/models/bert/modeling_bert.py", line 607, in forward | |
layer_outputs = layer_module( | |
File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl | |
return forward_call(*input, **kwargs) | |
File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/models/bert/modeling_bert.py", line 493, in forward | |
self_attention_outputs = self.attention( | |
File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl | |
return forward_call(*input, **kwargs) | |
File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/models/bert/modeling_bert.py", line 423, in forward | |
self_outputs = self.self( | |
File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl | |
return forward_call(*input, **kwargs) | |
File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/models/bert/modeling_bert.py", line 355, in forward | |
attention_probs = self.dropout(attention_probs) | |
File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl | |
return forward_call(*input, **kwargs) | |
File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/modules/dropout.py", line 58, in forward | |
return F.dropout(input, self.p, self.training, self.inplace) | |
File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/functional.py", line 1279, in dropout | |
return _VF.dropout_(input, p, training) if inplace else _VF.dropout(input, p, training) | |
RuntimeError: CUDA out of memory. Tried to allocate 782.00 MiB (GPU 0; 10.76 GiB total capacity; 3.28 GiB already allocated; 61.69 MiB free; 3.65 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF | |
[31m╭─────────────────────────────── [39m[1mTraceback (most recent call last)[31m[22m ────────────────────────────────╮ | |
[31m│[39m /n/fs/nlp-pranjal/SemSup-LMLC/cleaned_code/[1mmain.py[22m:[94m598[39m in [92m<module>[39m [31m│ | |
[31m│[39m [31m│ | |
[31m│[39m 595 │ main() [31m│ | |
[31m│[39m [31m│ | |
[31m│[39m /n/fs/nlp-pranjal/SemSup-LMLC/cleaned_code/[1mmain.py[22m:[94m513[39m in [92mmain[39m [31m│ | |
[31m│[39m [31m│ | |
[31m│[39m 510 │ │ train_result = trainer.train(resume_from_checkpoint=checkpoint) [31m│ | |
[31m│[39m 511 │ │ metrics = train_result.metrics [31m│ | |
[31m│[39m 512 │ │ max_train_samples = ( [31m│ | |
[31m│[39m [31m❱ [39m513 │ │ │ data_args.max_train_samples [94mif[39m data_args.max_train_samples [95mis[39m [95mnot[39m [94mNone[39m [94melse[39m [31m│ | |
[31m│[39m 514 │ │ ) [31m│ | |
[31m│[39m 515 │ │ metrics[[33m"train_samples"[39m] = [96mmin[39m(max_train_samples, [96mlen[39m(train_dataset)) [31m│ | |
[31m│[39m 516 [31m│ | |
[31m│[39m [31m│ | |
[31m│[39m /n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/[1mtrainer.py[22m:[94m1409[39m in [92mtrain[39m [31m│ | |
[31m│[39m [31m│ | |
[31m│[39m 1406 │ │ inner_training_loop = find_executable_batch_size( [31m│ | |
[31m│[39m 1407 │ │ │ [96mself[39m._inner_training_loop, [96mself[39m._train_batch_size, args.auto_find_batch_size [31m│ | |
[31m│[39m 1408 │ │ ) [31m│ | |
[31m│[39m [31m❱ [39m1409 │ │ [94mreturn[39m inner_training_loop( [31m│ | |
[31m│[39m 1410 │ │ │ args=args, [31m│ | |
[31m│[39m 1411 │ │ │ resume_from_checkpoint=resume_from_checkpoint, [31m│ | |
[31m│[39m 1412 │ │ │ trial=trial, [31m│ | |
[31m│[39m [31m│ | |
[31m│[39m /n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/[1mtrainer.py[22m:[94m1651[39m in [31m│ | |
[31m│[39m [92m_inner_training_loop[39m [31m│ | |
[31m│[39m [31m│ | |
[31m│[39m 1648 │ │ │ │ │ [94mwith[39m model.no_sync(): [31m│ | |
[31m│[39m 1649 │ │ │ │ │ │ tr_loss_step = [96mself[39m.training_step(model, inputs) [31m│ | |
[31m│[39m 1650 │ │ │ │ [94melse[39m: [31m│ | |
[31m│[39m [31m❱ [39m1651 │ │ │ │ │ tr_loss_step = [96mself[39m.training_step(model, inputs) [31m│ | |
[31m│[39m 1652 │ │ │ │ [31m│ | |
[31m│[39m 1653 │ │ │ │ [94mif[39m ( [31m│ | |
[31m│[39m 1654 │ │ │ │ │ args.logging_nan_inf_filter [31m│ | |
[31m│[39m [31m│ | |
[31m│[39m /n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/[1mtrainer.py[22m:[94m2349[39m in [31m│ | |
[31m│[39m [92mtraining_step[39m [31m│ | |
[31m│[39m [31m│ | |
[31m│[39m 2346 │ │ │ [94mreturn[39m loss_mb.reduce_mean().detach().to([96mself[39m.args.device) [31m│ | |
[31m│[39m 2347 │ │ [31m│ | |
[31m│[39m 2348 │ │ [94mwith[39m [96mself[39m.compute_loss_context_manager(): [31m│ | |
[31m│[39m [31m❱ [39m2349 │ │ │ loss = [96mself[39m.compute_loss(model, inputs) [31m│ | |
[31m│[39m 2350 │ │ [31m│ | |
[31m│[39m 2351 │ │ [94mif[39m [96mself[39m.args.n_gpu > [94m1[39m: [31m│ | |
[31m│[39m 2352 │ │ │ loss = loss.mean() # mean() to average on multi-gpu parallel training [31m│ | |
[31m│[39m [31m│ | |
[31m│[39m /n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/[1mtrainer.py[22m:[94m2381[39m in [31m│ | |
[31m│[39m [92mcompute_loss[39m [31m│ | |
[31m│[39m [31m│ | |
[31m│[39m 2378 │ │ │ labels = inputs.pop([33m"labels"[39m) [31m│ | |
[31m│[39m 2379 │ │ [94melse[39m: [31m│ | |
[31m│[39m 2380 │ │ │ labels = [94mNone[39m [31m│ | |
[31m│[39m [31m❱ [39m2381 │ │ outputs = model(**inputs) [31m│ | |
[31m│[39m 2382 │ │ # Save past state if it exists [31m│ | |
[31m│[39m 2383 │ │ # TODO: this needs to be fixed and made cleaner later. [31m│ | |
[31m│[39m 2384 │ │ [94mif[39m [96mself[39m.args.past_index >= [94m0[39m: [31m│ | |
[31m│[39m [31m│ | |
[31m│[39m /n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/modules/[1mmodule.py[22m:[94m1110[39m in [31m│ | |
[31m│[39m [92m_call_impl[39m [31m│ | |
[31m│[39m [31m│ | |
[31m│[39m 1107 │ │ # this function, and just call forward. [31m│ | |
[31m│[39m 1108 │ │ [94mif[39m [95mnot[39m ([96mself[39m._backward_hooks [95mor[39m [96mself[39m._forward_hooks [95mor[39m [96mself[39m._forward_pre_hooks [95mo[39m [31m│ | |
[31m│[39m 1109 │ │ │ │ [95mor[39m _global_forward_hooks [95mor[39m _global_forward_pre_hooks): [31m│ | |
[31m│[39m [31m❱ [39m1110 │ │ │ [94mreturn[39m forward_call(*[96minput[39m, **kwargs) [31m│ | |
[31m│[39m 1111 │ │ # Do not call functions when jit is used [31m│ | |
[31m│[39m 1112 │ │ full_backward_hooks, non_full_backward_hooks = [], [] [31m│ | |
[31m│[39m 1113 │ │ [94mif[39m [96mself[39m._backward_hooks [95mor[39m _global_backward_hooks: [31m│ | |
[31m│[39m [31m│ | |
[31m│[39m /n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/parallel/[1mdata_parallel.py[22m:[94m168[39m [31m│ | |
[31m│[39m in [92mforward[39m [31m│ | |
[31m│[39m [31m│ | |
[31m│[39m 165 │ │ │ [94mif[39m [96mlen[39m([96mself[39m.device_ids) == [94m1[39m: [31m│ | |
[31m│[39m 166 │ │ │ │ [94mreturn[39m [96mself[39m.module(*inputs[[94m0[39m], **kwargs[[94m0[39m]) [31m│ | |
[31m│[39m 167 │ │ │ replicas = [96mself[39m.replicate([96mself[39m.module, [96mself[39m.device_ids[:[96mlen[39m(inputs)]) [31m│ | |
[31m│[39m [31m❱ [39m168 │ │ │ outputs = [96mself[39m.parallel_apply(replicas, inputs, kwargs) [31m│ | |
[31m│[39m 169 │ │ │ [94mreturn[39m [96mself[39m.gather(outputs, [96mself[39m.output_device) [31m│ | |
[31m│[39m 170 │ [31m│ | |
[31m│[39m 171 │ [94mdef[39m [92mreplicate[39m([96mself[39m, module, device_ids): [31m│ | |
[31m│[39m [31m│ | |
[31m│[39m /n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/parallel/[1mdata_parallel.py[22m:[94m178[39m [31m│ | |
[31m│[39m in [92mparallel_apply[39m [31m│ | |
[31m│[39m [31m│ | |
[31m│[39m 175 │ │ [94mreturn[39m scatter_kwargs(inputs, kwargs, device_ids, dim=[96mself[39m.dim) [31m│ | |
[31m│[39m 176 │ [31m│ | |
[31m│[39m 177 │ [94mdef[39m [92mparallel_apply[39m([96mself[39m, replicas, inputs, kwargs): [31m│ | |
[31m│[39m [31m❱ [39m178 │ │ [94mreturn[39m parallel_apply(replicas, inputs, kwargs, [96mself[39m.device_ids[:[96mlen[39m(replicas)]) [31m│ | |
[31m│[39m 179 │ [31m│ | |
[31m│[39m 180 │ [94mdef[39m [92mgather[39m([96mself[39m, outputs, output_device): [31m│ | |
[31m│[39m 181 │ │ [94mreturn[39m gather(outputs, output_device, dim=[96mself[39m.dim) [31m│ | |
[31m│[39m [31m│ | |
[31m│[39m /n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/parallel/[1mparallel_apply.py[22m:[94m86[39m [31m│ | |
[31m│[39m in [92mparallel_apply[39m [31m│ | |
[31m│[39m [31m│ | |
[31m│[39m 83 │ [94mfor[39m i [95min[39m [96mrange[39m([96mlen[39m(inputs)): [31m│ | |
[31m│[39m 84 │ │ output = results[i] [31m│ | |
[31m│[39m 85 │ │ [94mif[39m [96misinstance[39m(output, ExceptionWrapper): [31m│ | |
[31m│[39m [31m❱ [39m86 │ │ │ output.reraise() [31m│ | |
[31m│[39m 87 │ │ outputs.append(output) [31m│ | |
[31m│[39m 88 │ [94mreturn[39m outputs [31m│ | |
[31m│[39m 89 [31m│ | |
[31m│[39m [31m│ | |
[31m│[39m /n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/[1m_utils.py[22m:[94m457[39m in [92mreraise[39m [31m│ | |
[31m│[39m [31m│ | |
[31m│[39m 454 │ │ │ # If the exception takes multiple arguments, don't try to [31m│ | |
[31m│[39m 455 │ │ │ # instantiate since we don't know how to [31m│ | |
[31m│[39m 456 │ │ │ [94mraise[39m [96mRuntimeError[39m(msg) [94mfrom[39m [96mNone[39m [31m│ | |
[31m│[39m [31m❱ [39m457 │ │ [94mraise[39m exception [31m│ | |
[31m│[39m 458 [31m│ | |
[31m│[39m 459 [31m│ | |
[31m│[39m 460 [94mdef[39m [92m_get_available_device_type[39m(): [31m│ | |
[31m╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ | |
[1mRuntimeError: [22mCaught RuntimeError in replica [1m0[22m on device [1m0[22m. | |
Original Traceback [1m([22mmost recent call last[1m)[22m: | |
File [32m"/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/parallel/parallel_apply.py"[39m, line [1m61[22m, in _worker | |
output = [1mmodule([22m*input, **kwargs[1m) | |
File [32m"/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py"[39m, line [1m1110[22m, in _call_impl | |
return [1mforward_call([22m*input, **kwargs[1m) | |
File [32m"/n/fs/nlp-pranjal/SemSup-LMLC/cleaned_code/src/models.py"[39m, line [1m459[22m, in forward | |
File [32m"/n/fs/nlp-pranjal/SemSup-LMLC/cleaned_code/src/models.py"[39m, line [1m247[22m, in coil_forward | |
lab_reps = [1mself.tok_proj([22moutputs_lab.last_hidden_state @ self.label_projection.weight[1m)[22m # Q * LQ * d | |
File [32m"/n/fs/nlp-pranjal/SemSup-LMLC/cleaned_code/src/models.py"[39m, line [1m399[22m, in forward_label_embeddings | |
desc_attention_mask: Optional[1m[[22mList[1m[[22mint[1m]][22m = [3mNone[23m, | |
File [32m"/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py"[39m, line [1m1110[22m, in _call_impl | |
return [1mforward_call([22m*input, **kwargs[1m) | |
File [32m"/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/models/bert/modeling_bert.py"[39m, line [1m1018[22m, in forward | |
encoder_outputs = [1mself.encoder( | |
File [32m"/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py"[39m, line [1m1110[22m, in _call_impl | |
return [1mforward_call([22m*input, **kwargs[1m) | |
File [32m"/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/models/bert/modeling_bert.py"[39m, line [1m607[22m, in forward | |
layer_outputs = [1mlayer_module( | |
File [32m"/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py"[39m, line [1m1110[22m, in _call_impl | |
return [1mforward_call([22m*input, **kwargs[1m) | |
File [32m"/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/models/bert/modeling_bert.py"[39m, line [1m493[22m, in forward | |
self_attention_outputs = [1mself.attention( | |
File [32m"/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py"[39m, line [1m1110[22m, in _call_impl | |
return [1mforward_call([22m*input, **kwargs[1m) | |
File [32m"/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/models/bert/modeling_bert.py"[39m, line [1m423[22m, in forward | |
self_outputs = [1mself.self( | |
File [32m"/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py"[39m, line [1m1110[22m, in _call_impl | |
return [1mforward_call([22m*input, **kwargs[1m) | |
File [32m"/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/models/bert/modeling_bert.py"[39m, line [1m355[22m, in forward | |
attention_probs = [1mself.dropout([22mattention_probs[1m) | |
File [32m"/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py"[39m, line [1m1110[22m, in _call_impl | |
return [1mforward_call([22m*input, **kwargs[1m) | |
File [32m"/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/modules/dropout.py"[39m, line [1m58[22m, in forward | |
return [1mF.dropout([22minput, self.p, self.training, self.inplace[1m) | |
File [32m"/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/functional.py"[39m, line [1m1279[22m, in dropout | |
return [1m_VF.dropout_([22minput, p, training[1m)[22m if inplace else [1m_VF.dropout([22minput, p, training[1m) | |
RuntimeError: CUDA out of memory. Tried to allocate [1m782.00[22m MiB [1m([22mGPU [1m0[22m; [1m10.76[22m GiB total capacity; [1m3.28[22m GiB already allocated; [1m61.69[22m MiB free; [1m3.65[22m GiB reserved in total | |
by PyTorch[1m)[22m If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and | |
PYTORCH_CUDA_ALLOC_CONF |