OpenSubject: Leveraging Video-Derived Identity and Diversity Priors for Subject-driven Image Generation and Manipulation Paper • 2512.08294 • Published 8 days ago • 17
OneThinker: All-in-one Reasoning Model for Image and Video Paper • 2512.03043 • Published 14 days ago • 30
Architecture Decoupling Is Not All You Need For Unified Multimodal Model Paper • 2511.22663 • Published 19 days ago • 28
Think with 3D: Geometric Imagination Grounded Spatial Reasoning from Limited Views Paper • 2510.18632 • Published Oct 21 • 21
MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer Paper • 2509.16197 • Published Sep 19 • 56