From Pixels to Words -- Towards Native One-Vision Models at Scale Paper • 2605.28820 • Published 1 day ago • 39
NEO1_5 Collection From Pixels to Words -- Towards Native One-Vision Models at Scale • 3 items • Updated about 3 hours ago • 5
LLaVA-OneVision-2: Towards Next-Generation Perceptual Intelligence Paper • 2605.25979 • Published 3 days ago • 20
SpatialBench: Is Your Spatial Foundation Model an All-Round Player? Paper • 2605.27367 • Published 2 days ago • 59
PhysX-Omni: Unified Simulation-Ready Physical 3D Generation for Rigid, Deformable, and Articulated Objects Paper • 2605.21572 • Published 8 days ago • 50
SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture Paper • 2605.12500 • Published 16 days ago • 189
SenseNova-U1 Collection SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-Unify Architecture • 9 items • Updated about 11 hours ago • 67
Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond Paper • 2604.22748 • Published Apr 24 • 227
Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding Paper • 2604.05015 • Published Apr 6 • 236
Bridging Semantic and Kinematic Conditions with Diffusion-based Discrete Motion Tokenizer Paper • 2603.19227 • Published Mar 19 • 42
MonoArt: Progressive Structural Reasoning for Monocular Articulated 3D Reconstruction Paper • 2603.19231 • Published Mar 19 • 36
Kinema4D: Kinematic 4D World Modeling for Spatiotemporal Embodied Simulation Paper • 2603.16669 • Published Mar 17 • 70
HSImul3R: Physics-in-the-Loop Reconstruction of Simulation-Ready Human-Scene Interactions Paper • 2603.15612 • Published Mar 16 • 153
view article Article NEO-unify: Building Native Multimodal Unified Models End to End sensenova • Mar 5 • 163
UniG2U-Bench: Do Unified Models Advance Multimodal Understanding? Paper • 2603.03241 • Published Mar 3 • 87
UniT: Unified Multimodal Chain-of-Thought Test-time Scaling Paper • 2602.12279 • Published Feb 12 • 20