Recommended GPU specs for this model?
#1
by
albertdigits
- opened
I imagine this is meant to be run on something more powerful than an RTX 4090 :) I'm running into the following when running it on mine:
(openmusic) user@workstation:~/projects$ python -m qa_mdt.pipeline
Seed set to 0
Add-ons: [<function waveform_rs_48k at 0x70831024a3a0>]
Dataset initialize finished
Reload ckpt specified in the config file ./qa_mdt/checkpoint_389999.ckpt
/home/user/miniconda3/envs/openmusic/lib/python3.9/site-packages/transformers/tokenization_utils_base.py:1601: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884
warnings.warn(
LatentDiffusion: Running in eps-prediction mode
/home/user/miniconda3/envs/openmusic/lib/python3.9/site-packages/torchlibrosa/stft.py:193: FutureWarning: Pass size=1024 as keyword args. From version 0.10 passing these as positional arguments will result in an error
fft_window = librosa.util.pad_center(fft_window, n_fft)
/home/user/miniconda3/envs/openmusic/lib/python3.9/site-packages/torch/functional.py:512: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3587.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
/home/user/miniconda3/envs/openmusic/lib/python3.9/site-packages/torchaudio/transforms/_transforms.py:580: UserWarning: Argument 'onesided' has been deprecated and has no influence on the behavior of this module.
warnings.warn(
mask ratio: 0.3 decode_layer: 8
DiffusionWrapper has 676.25 M params.
Keeping EMAs of 489.
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 8, 64, 64) = 32768 dimensions.
making attention of type 'vanilla' with 512 in_channels
/home/user/miniconda3/envs/openmusic/lib/python3.9/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
warnings.warn(
/home/user/miniconda3/envs/openmusic/lib/python3.9/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=VGG16_Weights.IMAGENET1K_V1`. You can also use `weights=VGG16_Weights.DEFAULT` to get the most up-to-date weights.
warnings.warn(msg)
Downloading: "https://download.pytorch.org/models/vgg16-397923af.pth" to /home/user/.cache/torch/hub/checkpoints/vgg16-397923af.pth
100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 528M/528M [00:12<00:00, 43.5MB/s]
Downloading vgg_lpips model from https://heibox.uni-heidelberg.de/f/607503859c864bc1b30b/?dl=1 to taming/modules/autoencoder/lpips/vgg.pth
8.19kB [00:00, 483kB/s]
loaded pretrained LPIPS loss from taming/modules/autoencoder/lpips/vgg.pth
/home/user/miniconda3/envs/openmusic/lib/python3.9/site-packages/torch/nn/utils/weight_norm.py:28: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
Removing weight norm...
Initial learning rate 1e-05
--> Reload weight of autoencoder from ./qa_mdt/checkpoints/hifi-gan/checkpoints/vae_mel_16k_64bins.ckpt
tokenizer_config.json: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 2.54k/2.54k [00:00<00:00, 1.44MB/s]
spiece.model: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 792k/792k [00:00<00:00, 10.7MB/s]
tokenizer.json: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 2.42M/2.42M [00:00<00:00, 7.03MB/s]
special_tokens_map.json: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 2.20k/2.20k [00:00<00:00, 1.38MB/s]
config.json: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 662/662 [00:00<00:00, 403kB/s]
model.safetensors: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 3.13G/3.13G [01:12<00:00, 43.4MB/s]
Waveform save path: ./log/latent_diffusion/qa_mdt/mos_as_token/val_0_09-24-19:05_cfg_scale_3.5_ddim_200_n_cand_3
Traceback (most recent call last):
File "/home/user/miniconda3/envs/openmusic/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/user/miniconda3/envs/openmusic/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/user/projects/qa_mdt/pipeline.py", line 94, in <module>
result = pipe("A modern synthesizer creating futuristic soundscapes.")
File "/home/user/miniconda3/envs/openmusic/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/user/projects/qa_mdt/pipeline.py", line 74, in __call__
infer(
File "/home/user/projects/qa_mdt/infer/infer_mos5.py", line 85, in infer
latent_diffusion.generate_sample(
File "/home/user/miniconda3/envs/openmusic/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/user/projects/qa_mdt/audioldm_train/modules/latent_diffusion/ddpm.py", line 1953, in generate_sample
with self.ema_scope("Plotting"):
File "/home/user/miniconda3/envs/openmusic/lib/python3.9/contextlib.py", line 119, in __enter__
return next(self.gen)
File "/home/user/projects/qa_mdt/audioldm_train/modules/latent_diffusion/ddpm.py", line 315, in ema_scope
self.model_ema.store(self.model.parameters())
File "/home/user/projects/qa_mdt/audioldm_train/modules/diffusionmodules/ema.py", line 68, in store
self.collected_params = [param.clone() for param in parameters]
File "/home/user/projects/qa_mdt/audioldm_train/modules/diffusionmodules/ema.py", line 68, in <listcomp>
self.collected_params = [param.clone() for param in parameters]
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 22.00 MiB. GPU
(openmusic) user@workstation ~/projects$ nvidia-smi
Tue Sep 24 19:05:58 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.42.06 Driver Version: 555.42.06 CUDA Version: 12.5 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 4090 Off | 00000000:01:00.0 Off | Off |
| 0% 39C P8 5W / 450W | 18MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 3760 G /usr/lib/xorg/Xorg 4MiB |
+-----------------------------------------------------------------------------------------+
(openmusic) user@workstation:~/projects$
Hi, i am author and
its gpu requirement is near to 24GB, so trying to run it on a 24GB machine is not always easy
And you can try to put Hifi-gan, VAE, Flan-t5 on cpu, leave the core mdt model on gpu is ok.