Huge memory consumption with SD3.5-medium

#18

by oddball516 - opened Nov 26, 2024

Nov 26, 2024

According to the picture here, SD3.5-medium should work fine on 10GB vRAM
https://stability.ai/news/introducing-stable-diffusion-3-5

However, my test program fails on a g4dn.xlarge AWS instance, it has 4C/16G + 48G swap, and a Tesla T4 CPU with 16GB vRAM. It runs out of memory due to CUDA couldn't allocate more memory. From nvidia-smi it already took ~15GB memory, and couldn't complete even one picture.

I'm wondering what's wrong here?

Attached fill source code.

import os
import json
import torch

from diffusers import DiffusionPipeline

pipe = DiffusionPipeline.from_pretrained("./stable-diffusion-3.5-medium/")
if torch.cuda.is_available():
    print('use cuda')
    pipe = pipe.to("cuda")
elif torch.mps.is_available():
    print('use mps')
    pipe = pipe.to('mps')
else:
    print('use cpu')

data = []
with open('data.json', 'r') as f:
    data = json.load(f)

os.makedirs('output', exist_ok=True)
for row in data:
    prompt   = '%s, style is %s, light is %s' % (row['prompt'], row['style'], row['light'])
    filename = 'output/%s.png' % (row['uuid'])
    height   = 1280
    width    = 1280
    
    if row['aspect_ratio'] == '16:9':
        width = 720
    elif row['aspect_ratio'] == '9:16':
        width = 720
        height = 1280
    
    print('saving', filename)
    image = pipe(prompt, height=height, width=width).images[0]
    image.save(filename)

yue32000

26 days ago

did it resolve for you

YaTharThShaRma999

26 days ago

@yue32000 @oddball516
The reason is because of the T5 text encoder, you can resolve it with
pipe.enable_model_cpu_offload()

oddball516

about 14 hours ago

@YaTharThShaRma999 Do you know how enable_model_cpu_offload() works? Are you saying the T5 model will be offloaded to non-gpu memory?

YaTharThShaRma999

about 13 hours ago

@oddball516 yeah kinda, when it’s needed however, it will be moved back to gpu for faster computation. After it’s done computing(1-2s), it will be moved back to cpu.

It’s very big, infact bigger then the real image gen model(4b vs 2b) itself but only used one time per image and is fast.

wonderlus

about 10 hours ago

•

edited about 10 hours ago

I want to be the best

oddball516

about 4 hours ago

Weird, it failed on T4 with anothe rerror.

Traceback (most recent call last):
  File "/home/diffusers/main.py", line 12, in <module>
    pipe = DiffusionPipeline.from_pretrained(
  File "/home/diffusers/venv/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "/home/diffusers/venv/lib/python3.10/site-packages/diffusers/pipelines/pipeline_utils.py", line 881, in from_pretrained
    loaded_sub_model = load_sub_model(
  File "/home/diffusers/venv/lib/python3.10/site-packages/diffusers/pipelines/pipeline_loading_utils.py", line 703, in load_sub_model
    loaded_sub_model = load_method(os.path.join(cached_folder, name), **loading_kwargs)
  File "/home/diffusers/venv/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "/home/diffusers/venv/lib/python3.10/site-packages/diffusers/models/modeling_utils.py", line 757, in from_pretrained
    unexpected_keys = load_model_dict_into_meta(
  File "/home/diffusers/venv/lib/python3.10/site-packages/diffusers/models/model_loading_utils.py", line 154, in load_model_dict_into_meta
    raise ValueError(
ValueError: Cannot load /root/.cache/huggingface/hub/models--stabilityai--stable-diffusion-3.5-medium/snapshots/b940f670f0eda2d07fbb75229e779da1ad11eb80/transformer because transformer_blocks.0.norm1.linear.bias expected shape tensor(..., device='meta', size=(9216,)), but got torch.Size([13824]). If you want to instead overwrite randomly initialized weights, please make sure to pass both `low_cpu_mem_usage=False` and `ignore_mismatched_sizes=True`. For more information, see also: https://github.com/huggingface/diffusers/issues/1619#issuecomment-1345604389 as an example.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment