--- license: openrail++ --- # CascadeV | An Implemention of Würstchen architecture for High-Resolution Video Generation ## News **[2024.07.17]** We release the [code](https://github.com/bytedance/CascadeV) and pretrained [weights](https://huggingface.co/ByteDance/CascadeV) of a DiT-based video VAE, which supports video reconstruction with a high compression factor (1x32x32=1024). The T2V model is still on the way. ## Introduction CascadeV is a video generation pipeline built upon the [Würstchen](https://openreview.net/forum?id=gU58d5QeGv) architecture. By using a highly compressed latent representation, we can generate longer videos with higher resolution. ## Video VAE Comparison of Our Cascade Approach with Other VAEs (on Latent Space of Shape 8x32x32) Video Recontruction: Original (left) vs. Reconstructed (right) | *Click to view the videos*