@singhsidhukuldeep on Hugging Face: "If you have ~300+ GB of V-RAM, you can run Mochi from @genmo A SOTA model…"

Hugging Face

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Back to feed

singhsidhukuldeep

posted an update 28 days ago

Post

2728

If you have ~300+ GB of V-RAM, you can run Mochi from @genmo

A SOTA model that dramatically closes the gap between closed and open video generation models.

Mochi 1 introduces revolutionary architecture featuring joint reasoning over 44,520 video tokens with full 3D attention. The model implements extended learnable rotary positional embeddings (RoPE) in three dimensions, with network-learned mixing frequencies for space and time axes.

The model incorporates cutting-edge improvements, including:
- SwiGLU feedforward layers
- Query-key normalization for enhanced stability
- Sandwich normalization for controlled internal activations

What is currently available?
The base model delivers impressive 480p video generation with exceptional motion quality and prompt adherence. Released under the Apache 2.0 license, it's freely available for both personal and commercial applications.

What's Coming?
Genmo has announced Mochi 1 HD, scheduled for release later this year, which will feature:
- Enhanced 720p resolution
- Improved motion fidelity
- Better handling of complex scene warping

Natwar

28 days ago

Awesome Visuals!

adamo1139

27 days ago

•

edited 27 days ago

Works fine on 24GB VRAM, with some limitations of course.
https://github.com/kijai/ComfyUI-MochiWrapper
https://github.com/victorchall/genmoai-smol

In this post