Speed optimization is possible with low impact on vram (Teacache, Efficient Time Embeddings, batching forward passes)

#111
by Jd1911 - opened

Hi Kijai

Hope all is well

Not sure how (and not educated enough) to know if these optimizations can be implemented.

The team at Voltage park did this experiment that resulted 3.1x speed up in generation time. We can leave the flash attention 3 out of it. But the other methods are super promising:

  • Efficient Time Embeddings
  • Batching Forward Passes
  • Intelligent Caching with TeaCache

It is not the standard TeaCache.

Do you think you can apply these through your wrapper?

More here: https://www.voltagepark.com/blog/accelerating-wan2-2-from-4-67s-to-1-5s-per-denoising-step-through-targeted-optimizations

Hey,

Thanks for the link, but I don't really see anything new there, we've had all this in ComfyUI for a while, and TeaCache is pretty outdated by now with newer alternatives such as MagCache and EasyCache.

Sign up or log in to comment