Spaces:

BestWishYsh
/

ConsisID-preview-Space

Running on L40S

App Files Files Community

Apply for community grant: Personal project (gpu)

#1

by BestWishYsh - opened 6 days ago

Owner 6 days ago

•

edited 5 days ago

Apply for community grant: Academic project (gpu). We develop an identity-preserving text-to-video generation model, ConsisID, which can keep human-identity consistent in the generated video.

arxiv: https://arxiv.org/abs/2411.17440
paper: https://huggingface.co/papers/2411.17440
page: https://pku-yuangroup.github.io/ConsisID/
code: https://github.com/PKU-YuanGroup/ConsisID

hysts

5 days ago

Hi @BestWishYsh , we've assigned ZeroGPU to this Space. Please check the compatibility and usage sections of this page so your Space can run on ZeroGPU.

Owner 5 days ago

thanks, let me check

hysts

5 days ago

BTW, it would be nice if you could provide more info, like arXiv link, GitHub URL, etc., when opening a grant request. Your request was tagged as a spam as it didn't have meaningful info.

Owner 5 days ago

i am sorry, thanks for your suggestion

Owner 4 days ago

@hysts hi, I have set @spaces.GPU(), but it still shows RuntimeError: No CUDA GPUs are available. Do you know how to fix it, thanks.

hysts

4 days ago

Can you provide more info about the error, like a stack trace?

Owner 4 days ago

•

edited 4 days ago

Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/spaces/zero/wrappers.py", line 135, in worker_init
torch.init(nvidia_uuid)
File "/usr/local/lib/python3.10/site-packages/spaces/zero/torch/patching.py", line 354, in init
torch.Tensor([0]).cuda()
File "/usr/local/lib/python3.10/site-packages/torch/cuda/init.py", line 314, in _lazy_init
torch._C._cuda_init()
RuntimeError: No CUDA GPUs are available
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/gradio/queueing.py", line 536, in process_events
response = await route_utils.call_process_api(
File "/usr/local/lib/python3.10/site-packages/gradio/route_utils.py", line 322, in call_process_api
output = await app.get_blocks().process_api(
File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 1935, in process_api
result = await self.call_function(
File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 1520, in call_function
prediction = await anyio.to_thread.run_sync( # type: ignore
File "/usr/local/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2441, in run_sync_in_worker_thread
return await future
File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 943, in run
result = context.run(func, *args)
File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 826, in wrapper
response = f(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 826, in wrapper
response = f(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/spaces/zero/wrappers.py", line 214, in gradio_handler
raise res.value
RuntimeError: No CUDA GPUs are available
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/gradio/queueing.py", line 536, in process_events
response = await route_utils.call_process_api(
File "/usr/local/lib/python3.10/site-packages/gradio/route_utils.py", line 322, in call_process_api
output = await app.get_blocks().process_api(
File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 1935, in process_api
result = await self.call_function(
File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 1520, in call_function
prediction = await anyio.to_thread.run_sync( # type: ignore
File "/usr/local/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2441, in run_sync_in_worker_thread
return await future
File "/usr/local/lib/python3.10/site-packages/anyio/_backends/asyncio.py", line 943, in run
result = context.run(func, *args)
File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 826, in wrapper
response = f(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 826, in wrapper
response = f(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/spaces/zero/wrappers.py", line 184, in gradio_handler
schedule_response = client.schedule(task_id=task_id, request=request, duration=duration)
File "/usr/local/lib/python3.10/site-packages/spaces/zero/client.py", line 119, in schedule
raise gr.Error(
gradio.exceptions.Error: 'The requested GPU duration (240s) is larger than the maximum allowed'

Owner 4 days ago

Is duration＝240s too long?

hysts

4 days ago

Thanks! I've been looking into the issue, but so far haven't been able to resolve it. The fact that this Space takes like 10 minutes or so just to restart is making debugging difficult. Anyway, I'll let you know once I find something.

hysts

4 days ago

Well, not 100% sure what is the cause of the error, but looks like this diff fixes the CUDA error:

diff --git a/app.py b/app.py
index e119d2b..60bb4de 100644
--- a/app.py
+++ b/app.py
@@ -122,7 +122,6 @@ def infer(
     num_inference_steps: int,
     guidance_scale: float,
     seed: int = 42,
-    progress=gr.Progress(track_tqdm=True),
 ):
     if seed == -1:
         seed = random.randint(0, 2**8 - 1)
@@ -170,6 +169,38 @@ def infer(
     return (video_pt, seed)
 
 
+@spaces.GPU(duration=180)
+def generate(
+    prompt,
+    image_input,
+    seed_value,
+    scale_status,
+    rife_status,
+):
+    latents, seed = infer(
+        prompt,
+        image_input,
+        num_inference_steps=50,
+        guidance_scale=7.0,
+        seed=seed_value,
+    )
+    if scale_status:
+        latents = upscale_batch_and_concatenate(upscale_model, latents, device)
+    if rife_status:
+        latents = rife_inference_with_latents(frame_interpolation_model, latents)
+
+    batch_size = latents.shape[0]
+    batch_video_frames = []
+    for batch_idx in range(batch_size):
+        pt_image = latents[batch_idx]
+        pt_image = torch.stack([pt_image[i] for i in range(pt_image.shape[0])])
+
+        image_np = VaeImageProcessor.pt_to_numpy(pt_image)
+        image_pil = VaeImageProcessor.numpy_to_pil(image_np)
+        batch_video_frames.append(image_pil)
+    return batch_video_frames
+
+
 def convert_to_gif(video_path):
     clip = VideoFileClip(video_path)
     gif_path = video_path.replace(".mp4", ".gif")
@@ -320,8 +351,8 @@ with gr.Blocks() as demo:
     </table>
         """)
 
-    @spaces.GPU(duration=180)
-    def generate(
+
+    def run(
         prompt,
         image_input,
         seed_value,
@@ -329,29 +360,11 @@ with gr.Blocks() as demo:
         rife_status,
         progress=gr.Progress(track_tqdm=True)
     ):
-        latents, seed = infer(
-            prompt,
-            image_input,
-            num_inference_steps=50,
-            guidance_scale=7.0,
-            seed=seed_value,
-            progress=progress,
-        )
-        if scale_status:
-            latents = upscale_batch_and_concatenate(upscale_model, latents, device)
-        if rife_status:
-            latents = rife_inference_with_latents(frame_interpolation_model, latents)
-
-        batch_size = latents.shape[0]
-        batch_video_frames = []
-        for batch_idx in range(batch_size):
-            pt_image = latents[batch_idx]
-            pt_image = torch.stack([pt_image[i] for i in range(pt_image.shape[0])])
-
-            image_np = VaeImageProcessor.pt_to_numpy(pt_image)
-            image_pil = VaeImageProcessor.numpy_to_pil(image_np)
-            batch_video_frames.append(image_pil)
-
+        batch_video_frames = generate(prompt,
+                                      image_input,
+                                      seed_value,
+                                      scale_status,
+                                      rife_status)
         video_path = save_video(batch_video_frames[0], fps=math.ceil((len(batch_video_frames[0]) - 1) / 6))
         video_update = gr.update(visible=True, value=video_path)
         gif_path = convert_to_gif(video_path)
@@ -361,7 +374,7 @@ with gr.Blocks() as demo:
         return video_path, video_update, gif_update, seed_update
 
     generate_button.click(
-        generate,
+        fn=run,
         inputs=[prompt, image_input, seed_param, enable_scale, enable_rife],
         outputs=[video_output, download_video_button, download_gif_button, seed_text],
     )

This should resolve the current error, but when I run the Space, the inference took more than the specified duration, which is 180 seconds, and the "GPU task aborted error" occurred. Maybe you can decrease the number of inference steps to avoid the error, though.

Owner 4 days ago

•

edited 4 days ago

Thanks a lot, but is the maximum time of ZeroGPU only 180s? Reducing the step will seriously reduce the generated quality ...

hysts

4 days ago

Well, actually, I don't know about it. cc @cbensimon

Owner 4 days ago

It seems that the longest duration is only 120s

https://huggingface.co/spaces/zero-gpu-explorers/README/discussions/118

hysts

4 days ago

But your Space doesn't seem to raise the maximum duration error with duration=180.

hysts

4 days ago

The discussion is a bit old and the bug that showed -1 day or something has already been fixed, so I'm not sure but maybe the 120 second issue in the discussion might have been fixed as well.

Owner 4 days ago

The current error is:

Owner 4 days ago

Does this error mean that users need to subscribe to enterprise hub for it to run the hf space? But it seems that other spaces using zerogpu do not need ...

hysts

4 days ago

Ah, ok, it involves a couple of known issues of spaces.

There's a bug where the quota error is raised when the remaining quota is exactly the same as the specified duration.
There's a bug where users are considered not logged in when the @spaces.GPU decorator is added to inner functions.

You can avoid the second issue by adding an attribute to the outer function, like wrapper_fn.zergpu = True.
In your case, adding the following line after the definition of the run function would fix the second issue.

run.zerogpu = True

If I remember correctly, logged-in users have 300 seconds of quota, so they don't have to subscribe to PRO.

Owner 4 days ago

Thanks, let me try it

Owner 4 days ago

I have added "run.zerogpu = True", but still this error

The current error is:

hysts

4 days ago

Hmm, I see. Thanks for checking. This is the way @cbensimon told me before and it worked back then, but something might have been changed since then. Let's wait for @cbensimon 's response.

Owner 4 days ago

Thank you for your great support, looking forward to his reply.

PS: After I restarted the space, "No CUDA GPUs are available" appeared again ...

hysts

4 days ago

Oh, weird.
I think it would be great if this Space can run on ZeroGPU, but maybe we can assign L40S with 1 hour sleep time in the meantime if your Space can run on it.

hysts

4 days ago

OK, I just switched the hardware to L40S to see if it works.

Owner 4 days ago

•

edited 4 days ago

thanks a lot, let's wait for it to restart. And should we add "run.zerogpu = True" when using L40S?

hysts

4 days ago

And should we add "run.zerogpu = True" when using L40S?

No, it should work without it. I doesn't affect normal GPU execution.
Also, spaces does nothing on Spaces with normal GPU, so you don't have to remove the @spaces.GPU decorator either.

Owner 4 days ago

The space runs normally (with L40S). It seems that ZeroGPU has some hidden bugs.

hysts

4 days ago

Looks like CUDA OOM occurred.

Traceback (most recent call last):
  File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/gradio/queueing.py", line 536, in process_events
    response = await route_utils.call_process_api(
  File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/gradio/route_utils.py", line 322, in call_process_api
    output = await app.get_blocks().process_api(
  File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/gradio/blocks.py", line 1935, in process_api
    result = await self.call_function(
  File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/gradio/blocks.py", line 1520, in call_function
    prediction = await anyio.to_thread.run_sync(  # type: ignore
  File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2441, in run_sync_in_worker_thread
    return await future
  File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 943, in run
    result = context.run(func, *args)
  File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/gradio/utils.py", line 826, in wrapper
    response = f(*args, **kwargs)
  File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/gradio/utils.py", line 826, in wrapper
    response = f(*args, **kwargs)
  File "/home/user/app/app.py", line 350, in run
    batch_video_frames, seed = generate(
  File "/home/user/app/app.py", line 156, in generate
    video_pt = pipe(
  File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/home/user/app/models/pipeline_consisid.py", line 883, in __call__
    video = self.decode_latents(latents)
  File "/home/user/app/models/pipeline_consisid.py", line 463, in decode_latents
    frames = self.vae.decode(latents).sample
  File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/diffusers/utils/accelerate_utils.py", line 46, in wrapper
    return method(self, *args, **kwargs)
  File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/diffusers/models/autoencoders/autoencoder_kl_cogvideox.py", line 1278, in decode
    decoded = self._decode(z).sample
  File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/diffusers/models/autoencoders/autoencoder_kl_cogvideox.py", line 1249, in _decode
    z_intermediate, conv_cache = self.decoder(z_intermediate, conv_cache=conv_cache)
  File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/diffusers/models/autoencoders/autoencoder_kl_cogvideox.py", line 970, in forward
    hidden_states, new_conv_cache[conv_cache_key] = up_block(
  File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/diffusers/models/autoencoders/autoencoder_kl_cogvideox.py", line 647, in forward
    hidden_states, new_conv_cache[conv_cache_key] = resnet(
  File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/diffusers/models/autoencoders/autoencoder_kl_cogvideox.py", line 291, in forward
    hidden_states, new_conv_cache["norm1"] = self.norm1(hidden_states, zq, conv_cache=conv_cache.get("norm1"))
  File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/diffusers/models/autoencoders/autoencoder_kl_cogvideox.py", line 187, in forward
    new_f = norm_f * conv_y + conv_b
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.32 GiB. GPU 0 has a total capacity of 44.32 GiB of which 181.25 MiB is free. Including non-PyTorch memory, this process has 0 bytes memory in use. Of the allocated memory 35.02 GiB is allocated by PyTorch, and 7.06 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

Owner 4 days ago

oh, let's try again by adding thes two lines
pipe.enable_model_cpu_offload()
pipe.enable_sequential_cpu_offload()

Owner 4 days ago

the space can generate videos normally now, and it seems that only about 19G of gpu memory is needed

hysts

4 days ago

Awesome!
Also, thanks for your efforts in debugging the ZeroGPU issues.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment