Does "Hyper-SD15-1step-lora.safetensors" work for img2img?
Hi, I'm trying to use "Hyper-SD15-1step-lora.safetensors" for StableDiffusionControlNetImg2ImgPipeline
. However, it seems that the used TCD scheduler will result in empty latents here
https://github.com/huggingface/diffusers/blob/39215aa30e54586419fd3aa1ee467cbee2db908e/src/diffusers/pipelines/controlnet/pipeline_controlnet_img2img.py#L863-L864
Specifically, in this line init_latents = self.scheduler.add_noise(init_latents, noise, timestep)
while the input init_latents
is of correct shape, say [1, 4, 64, 64]
, the output init_latents
(after scheduler adding noise) will somehow be [0, 4, 64, 64]
. Is this something expected or am I doing anything wrong here?
If this is expected with TCDScheduler
, then is TCD a must for using Hyper-SD15-1step-lora.safetensors? What are other recommended schedulers? Thanks in advance.
Hi,
@zjysteven
Can you provide your example inference script so we can check it for you?
Please see this minimal script:
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '7'
import torch
from diffusers import (
ControlNetModel,
StableDiffusionControlNetImg2ImgPipeline,
TCDScheduler
)
from diffusers.utils import load_image, make_image_grid
from huggingface_hub import hf_hub_download
controlnet = ControlNetModel.from_pretrained(
'lllyasviel/control_v11f1e_sd15_tile',
torch_dtype=torch.float16
)
pipe = StableDiffusionControlNetImg2ImgPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5",
controlnet=controlnet,
torch_dtype=torch.float16,
safety_checker=None
).to('cuda')
pipe.enable_xformers_memory_efficient_attention()
pipe.scheduler = TCDScheduler.from_config(pipe.scheduler.config)
pipe.load_lora_weights(hf_hub_download("ByteDance/Hyper-SD", "Hyper-SD15-1step-lora.safetensors"))
original = load_image(
'https://huggingface.co/lllyasviel/control_v11f1e_sd15_tile/resolve/main/images/original.png'
)
original = original.resize((512, 512))
low_res = original.resize((64, 64))
prompt = f"a dog sitting on the grass, realistic, best quality, extremely detailed"
negative_prompt = "monochrome, lowres, bad anatomy, worst quality, low quality"
generator = torch.manual_seed(2)
eta = 1.0
image = pipe(
prompt=prompt,
negative_prompt=negative_prompt,
image=low_res,
control_image=low_res,
width=512,
height=512,
num_inference_steps=1,
guidance_scale=0.0,
eta=eta,
strength=0.8,
generator=generator,
).images[0]
Running it will yield the following error:
Traceback (most recent call last):
File "/home/jz288/coadp_lcm_stream/hypersd_bug.py", line 41, in <module>
image = pipe(
File "/home/jz288/anaconda3/envs/vc/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/jz288/diffusers/src/diffusers/pipelines/controlnet/pipeline_controlnet_img2img.py", line 1302, in __call__
image = self.image_processor.postprocess(image, output_type=output_type, do_denormalize=do_denormalize)
File "/home/jz288/diffusers/src/diffusers/image_processor.py", line 603, in postprocess
image = torch.stack(
RuntimeError: stack expects a non-empty TensorList
and after investigation the reason was what I mentioned above, the initialized latent is somehow of shape [0, 4, 64, 64]
so the output image
is [0, 3, 512, 512]
and thus being an empty TensorList.
Hi,
@zjysteven
You would need to set strength=1.0 to get num_inference_steps=1 works.
The strength parameter controls the number of iterations utilized extra control, so the timestep would be None if strength < 1 since 1 * 0.8 = 0.8 < 1.
You are absolutely right. Can't imagine I missed this simple detail earlier. Thank you.