Does "Hyper-SD15-1step-lora.safetensors" work for img2img?

#10

by zjysteven - opened Apr 24

Apr 24

•

Hi, I'm trying to use "Hyper-SD15-1step-lora.safetensors" for StableDiffusionControlNetImg2ImgPipeline. However, it seems that the used TCD scheduler will result in empty latents here
https://github.com/huggingface/diffusers/blob/39215aa30e54586419fd3aa1ee467cbee2db908e/src/diffusers/pipelines/controlnet/pipeline_controlnet_img2img.py#L863-L864

Specifically, in this line init_latents = self.scheduler.add_noise(init_latents, noise, timestep) while the input init_latents is of correct shape, say [1, 4, 64, 64], the output init_latents (after scheduler adding noise) will somehow be [0, 4, 64, 64]. Is this something expected or am I doing anything wrong here?

If this is expected with TCDScheduler, then is TCD a must for using Hyper-SD15-1step-lora.safetensors? What are other recommended schedulers? Thanks in advance.

Yanzuo

ByteDance org Apr 25

Hi, @zjysteven
Can you provide your example inference script so we can check it for you?

zjysteven

Apr 25

Please see this minimal script:

import os
os.environ['CUDA_VISIBLE_DEVICES'] = '7'

import torch

from diffusers import (
    ControlNetModel, 
    StableDiffusionControlNetImg2ImgPipeline, 
    TCDScheduler
)
from diffusers.utils import load_image, make_image_grid
from huggingface_hub import hf_hub_download

controlnet = ControlNetModel.from_pretrained(
    'lllyasviel/control_v11f1e_sd15_tile', 
    torch_dtype=torch.float16
)

pipe = StableDiffusionControlNetImg2ImgPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    controlnet=controlnet,
    torch_dtype=torch.float16,
    safety_checker=None
).to('cuda')
pipe.enable_xformers_memory_efficient_attention()

pipe.scheduler = TCDScheduler.from_config(pipe.scheduler.config)
pipe.load_lora_weights(hf_hub_download("ByteDance/Hyper-SD", "Hyper-SD15-1step-lora.safetensors"))

original = load_image(
    'https://huggingface.co/lllyasviel/control_v11f1e_sd15_tile/resolve/main/images/original.png'
)
original = original.resize((512, 512))
low_res = original.resize((64, 64))

prompt = f"a dog sitting on the grass, realistic, best quality, extremely detailed"
negative_prompt = "monochrome, lowres, bad anatomy, worst quality, low quality"

generator = torch.manual_seed(2)
eta = 1.0
image = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    image=low_res, 
    control_image=low_res,
    width=512,
    height=512,
    num_inference_steps=1,
    guidance_scale=0.0,
    eta=eta, 
    strength=0.8,
    generator=generator,
).images[0]

Running it will yield the following error:

Traceback (most recent call last):
  File "/home/jz288/coadp_lcm_stream/hypersd_bug.py", line 41, in <module>
    image = pipe(
  File "/home/jz288/anaconda3/envs/vc/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/jz288/diffusers/src/diffusers/pipelines/controlnet/pipeline_controlnet_img2img.py", line 1302, in __call__
    image = self.image_processor.postprocess(image, output_type=output_type, do_denormalize=do_denormalize)
  File "/home/jz288/diffusers/src/diffusers/image_processor.py", line 603, in postprocess
    image = torch.stack(
RuntimeError: stack expects a non-empty TensorList

and after investigation the reason was what I mentioned above, the initialized latent is somehow of shape [0, 4, 64, 64] so the output image is [0, 3, 512, 512] and thus being an empty TensorList.

Yanzuo

ByteDance org Apr 26

Hi, @zjysteven
You would need to set strength=1.0 to get num_inference_steps=1 works.
The strength parameter controls the number of iterations utilized extra control, so the timestep would be None if strength < 1 since 1 * 0.8 = 0.8 < 1.

zjysteven

Apr 26

You are absolutely right. Can't imagine I missed this simple detail earlier. Thank you.

zjysteven changed discussion status to closed Apr 26

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment