Pranjal2041's picture
Initial Commit
4014562
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
wandb: WARNING Serializing object of type dict that is 589920 bytes
wandb: WARNING Serializing object of type dict that is 589920 bytes
0%| | 0/70340 [00:00<?, ?it/s]
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
0%| | 0/70340 [00:00<?, ?it/s]/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/data/data_collator.py:132: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:210.)
batch[k] = torch.tensor([f[k] for f in features])
/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/data/data_collator.py:132: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:210.)
batch[k] = torch.tensor([f[k] for f in features])
/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/data/data_collator.py:132: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:210.)
batch[k] = torch.tensor([f[k] for f in features])
/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/data/data_collator.py:132: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:210.)
batch[k] = torch.tensor([f[k] for f in features])
/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/data/data_collator.py:132: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:210.)
batch[k] = torch.tensor([f[k] for f in features])
/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/data/data_collator.py:132: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:210.)
batch[k] = torch.tensor([f[k] for f in features])
/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/data/data_collator.py:132: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:210.)
batch[k] = torch.tensor([f[k] for f in features])
/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/data/data_collator.py:132: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:210.)
batch[k] = torch.tensor([f[k] for f in features])
/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/data/data_collator.py:132: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:210.)
batch[k] = torch.tensor([f[k] for f in features])
/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/data/data_collator.py:132: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:210.)
batch[k] = torch.tensor([f[k] for f in features])
/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/data/data_collator.py:132: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:210.)
batch[k] = torch.tensor([f[k] for f in features])
/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/data/data_collator.py:132: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:210.)
batch[k] = torch.tensor([f[k] for f in features])
/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/data/data_collator.py:132: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:210.)
batch[k] = torch.tensor([f[k] for f in features])
/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/data/data_collator.py:132: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:210.)
batch[k] = torch.tensor([f[k] for f in features])
/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/data/data_collator.py:132: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:210.)
batch[k] = torch.tensor([f[k] for f in features])
/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/data/data_collator.py:132: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:210.)
batch[k] = torch.tensor([f[k] for f in features])
Traceback (most recent call last):
File "/n/fs/nlp-pranjal/SemSup-LMLC/cleaned_code/main.py", line 598, in <module>
File "/n/fs/nlp-pranjal/SemSup-LMLC/cleaned_code/main.py", line 513, in main
data_args.max_train_samples if data_args.max_train_samples is not None else len(train_dataset)
File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/trainer.py", line 1409, in train
return inner_training_loop(
File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/trainer.py", line 1651, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/trainer.py", line 2349, in training_step
loss = self.compute_loss(model, inputs)
File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/trainer.py", line 2381, in compute_loss
outputs = model(**inputs)
File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/parallel/data_parallel.py", line 168, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/parallel/data_parallel.py", line 178, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply
output.reraise()
File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/_utils.py", line 457, in reraise
raise exception
RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
output = module(*input, **kwargs)
File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/n/fs/nlp-pranjal/SemSup-LMLC/cleaned_code/src/models.py", line 459, in forward
File "/n/fs/nlp-pranjal/SemSup-LMLC/cleaned_code/src/models.py", line 247, in coil_forward
lab_reps = self.tok_proj(outputs_lab.last_hidden_state @ self.label_projection.weight) # Q * LQ * d
File "/n/fs/nlp-pranjal/SemSup-LMLC/cleaned_code/src/models.py", line 399, in forward_label_embeddings
desc_attention_mask: Optional[List[int]] = None,
File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/models/bert/modeling_bert.py", line 1018, in forward
encoder_outputs = self.encoder(
File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/models/bert/modeling_bert.py", line 607, in forward
layer_outputs = layer_module(
File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/models/bert/modeling_bert.py", line 493, in forward
self_attention_outputs = self.attention(
File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/models/bert/modeling_bert.py", line 423, in forward
self_outputs = self.self(
File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/models/bert/modeling_bert.py", line 355, in forward
attention_probs = self.dropout(attention_probs)
File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/modules/dropout.py", line 58, in forward
return F.dropout(input, self.p, self.training, self.inplace)
File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/functional.py", line 1279, in dropout
return _VF.dropout_(input, p, training) if inplace else _VF.dropout(input, p, training)
RuntimeError: CUDA out of memory. Tried to allocate 782.00 MiB (GPU 0; 10.76 GiB total capacity; 3.28 GiB already allocated; 61.69 MiB free; 3.65 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /n/fs/nlp-pranjal/SemSup-LMLC/cleaned_code/main.py:598 in <module> │
│ │
│ 595 │ main() │
│ │
│ /n/fs/nlp-pranjal/SemSup-LMLC/cleaned_code/main.py:513 in main │
│ │
│ 510 │ │ train_result = trainer.train(resume_from_checkpoint=checkpoint) │
│ 511 │ │ metrics = train_result.metrics │
│ 512 │ │ max_train_samples = ( │
│ ❱ 513 │ │ │ data_args.max_train_samples if data_args.max_train_samples is not None else │
│ 514 │ │ ) │
│ 515 │ │ metrics["train_samples"] = min(max_train_samples, len(train_dataset)) │
│ 516 │
│ │
│ /n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/trainer.py:1409 in train │
│ │
│ 1406 │ │ inner_training_loop = find_executable_batch_size( │
│ 1407 │ │ │ self._inner_training_loop, self._train_batch_size, args.auto_find_batch_size │
│ 1408 │ │ ) │
│ ❱ 1409 │ │ return inner_training_loop( │
│ 1410 │ │ │ args=args, │
│ 1411 │ │ │ resume_from_checkpoint=resume_from_checkpoint, │
│ 1412 │ │ │ trial=trial, │
│ │
│ /n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/trainer.py:1651 in │
│ _inner_training_loop │
│ │
│ 1648 │ │ │ │ │ with model.no_sync(): │
│ 1649 │ │ │ │ │ │ tr_loss_step = self.training_step(model, inputs) │
│ 1650 │ │ │ │ else: │
│ ❱ 1651 │ │ │ │ │ tr_loss_step = self.training_step(model, inputs) │
│ 1652 │ │ │ │ │
│ 1653 │ │ │ │ if ( │
│ 1654 │ │ │ │ │ args.logging_nan_inf_filter │
│ │
│ /n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/trainer.py:2349 in │
│ training_step │
│ │
│ 2346 │ │ │ return loss_mb.reduce_mean().detach().to(self.args.device) │
│ 2347 │ │ │
│ 2348 │ │ with self.compute_loss_context_manager(): │
│ ❱ 2349 │ │ │ loss = self.compute_loss(model, inputs) │
│ 2350 │ │ │
│ 2351 │ │ if self.args.n_gpu > 1: │
│ 2352 │ │ │ loss = loss.mean() # mean() to average on multi-gpu parallel training │
│ │
│ /n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/trainer.py:2381 in │
│ compute_loss │
│ │
│ 2378 │ │ │ labels = inputs.pop("labels") │
│ 2379 │ │ else: │
│ 2380 │ │ │ labels = None │
│ ❱ 2381 │ │ outputs = model(**inputs) │
│ 2382 │ │ # Save past state if it exists │
│ 2383 │ │ # TODO: this needs to be fixed and made cleaner later. │
│ 2384 │ │ if self.args.past_index >= 0: │
│ │
│ /n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py:1110 in │
│ _call_impl │
│ │
│ 1107 │ │ # this function, and just call forward. │
│ 1108 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o │
│ 1109 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1110 │ │ │ return forward_call(*input, **kwargs) │
│ 1111 │ │ # Do not call functions when jit is used │
│ 1112 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1113 │ │ if self._backward_hooks or _global_backward_hooks: │
│ │
│ /n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/parallel/data_parallel.py:168 │
│ in forward │
│ │
│ 165 │ │ │ if len(self.device_ids) == 1: │
│ 166 │ │ │ │ return self.module(*inputs[0], **kwargs[0]) │
│ 167 │ │ │ replicas = self.replicate(self.module, self.device_ids[:len(inputs)]) │
│ ❱ 168 │ │ │ outputs = self.parallel_apply(replicas, inputs, kwargs) │
│ 169 │ │ │ return self.gather(outputs, self.output_device) │
│ 170 │ │
│ 171 │ def replicate(self, module, device_ids): │
│ │
│ /n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/parallel/data_parallel.py:178 │
│ in parallel_apply │
│ │
│ 175 │ │ return scatter_kwargs(inputs, kwargs, device_ids, dim=self.dim) │
│ 176 │ │
│ 177 │ def parallel_apply(self, replicas, inputs, kwargs): │
│ ❱ 178 │ │ return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) │
│ 179 │ │
│ 180 │ def gather(self, outputs, output_device): │
│ 181 │ │ return gather(outputs, output_device, dim=self.dim) │
│ │
│ /n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/parallel/parallel_apply.py:86 │
│ in parallel_apply │
│ │
│ 83 │ for i in range(len(inputs)): │
│ 84 │ │ output = results[i] │
│ 85 │ │ if isinstance(output, ExceptionWrapper): │
│ ❱ 86 │ │ │ output.reraise() │
│ 87 │ │ outputs.append(output) │
│ 88 │ return outputs │
│ 89 │
│ │
│ /n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/_utils.py:457 in reraise │
│ │
│ 454 │ │ │ # If the exception takes multiple arguments, don't try to │
│ 455 │ │ │ # instantiate since we don't know how to │
│ 456 │ │ │ raise RuntimeError(msg) from None │
│ ❱ 457 │ │ raise exception │
│ 458 │
│ 459 │
│ 460 def _get_available_device_type(): │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
output = module(*input, **kwargs)
File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/n/fs/nlp-pranjal/SemSup-LMLC/cleaned_code/src/models.py", line 459, in forward
File "/n/fs/nlp-pranjal/SemSup-LMLC/cleaned_code/src/models.py", line 247, in coil_forward
lab_reps = self.tok_proj(outputs_lab.last_hidden_state @ self.label_projection.weight) # Q * LQ * d
File "/n/fs/nlp-pranjal/SemSup-LMLC/cleaned_code/src/models.py", line 399, in forward_label_embeddings
desc_attention_mask: Optional[List[int]] = None,
File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/models/bert/modeling_bert.py", line 1018, in forward
encoder_outputs = self.encoder(
File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/models/bert/modeling_bert.py", line 607, in forward
layer_outputs = layer_module(
File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/models/bert/modeling_bert.py", line 493, in forward
self_attention_outputs = self.attention(
File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/models/bert/modeling_bert.py", line 423, in forward
self_outputs = self.self(
File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/models/bert/modeling_bert.py", line 355, in forward
attention_probs = self.dropout(attention_probs)
File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/modules/dropout.py", line 58, in forward
return F.dropout(input, self.p, self.training, self.inplace)
File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/functional.py", line 1279, in dropout
return _VF.dropout_(input, p, training) if inplace else _VF.dropout(input, p, training)
RuntimeError: CUDA out of memory. Tried to allocate 782.00 MiB (GPU 0; 10.76 GiB total capacity; 3.28 GiB already allocated; 61.69 MiB free; 3.65 GiB reserved in total
by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and
PYTORCH_CUDA_ALLOC_CONF