metadata

tags:
  - Text-to-Video

zeroscope_v2 567w

A watermark-free Modelscope-based video model optimized for producing high-quality 16:9 compositions and a smooth video output. This model was trained using 9,923 clips and 29,769 tagged frames at 24 frames, 576x320 resolution.
zeroscope_v2_567w is specifically designed for upscaling with zeroscope_v2_XL using vid2vid in the 1111 text2video extension by kabachuha. Leveraging this model as a preliminary step allows for superior overall compositions at higher resolutions in zeroscope_v2_XL, permitting faster exploration in 576x320 before transitioning to a high-resolution render. See some example outputs that have been upscaled to 1024x576 using zeroscope_v2_XL. (courtesy of dotsimulate)

zeroscope_v2_576w uses 7.9gb of vram when rendering 30 frames at 576x320

Using it with the 1111 text2video extension

Download files in the zs2_576w folder.
Replace the respective files in the 'stable-diffusion-webui\models\ModelScope\t2v' directory.

Upscaling recommendations

For upscaling, it's recommended to use zeroscope_v2_XL via vid2vid in the 1111 extension. It works best at 1024x576 with a denoise strength between 0.66 and 0.85. Remember to use the same prompt that was used to generate the original clip.

Known issues

Lower resolutions or fewer frames could lead to suboptimal output.

Thanks to camenduru, kabachuha, ExponentialML, dotsimulate, VANYA, polyware, tin2tin