7 58 51

Haiwen Diao

Paranioar

https://Paranioar.github.io/

AI & ML interests

Vision-and-Language, Parameter-efficient Transfer Learning, Multi-modal Large Language Model

Recent Activity

commentedon a paper about 2 hours ago

From Pixels to Words -- Towards Native One-Vision Models at Scale

updated a collection about 3 hours ago

NEO1_5

upvoted a paper about 3 hours ago

From Pixels to Words -- Towards Native One-Vision Models at Scale

View all activity

Organizations

upvoted a paper about 3 hours ago

From Pixels to Words -- Towards Native One-Vision Models at Scale

Paper • 2605.28820 • Published 1 day ago • 39

upvoted a collection about 3 hours ago

NEO1_5

Collection

From Pixels to Words -- Towards Native One-Vision Models at Scale • 3 items • Updated about 3 hours ago • 5

upvoted 2 papers about 18 hours ago

LLaVA-OneVision-2: Towards Next-Generation Perceptual Intelligence

Paper • 2605.25979 • Published 3 days ago • 20

SpatialBench: Is Your Spatial Foundation Model an All-Round Player?

Paper • 2605.27367 • Published 2 days ago • 59

upvoted a paper 6 days ago

PhysX-Omni: Unified Simulation-Ready Physical 3D Generation for Rigid, Deformable, and Articulated Objects

Paper • 2605.21572 • Published 8 days ago • 50

upvoted a paper 15 days ago

SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture

Paper • 2605.12500 • Published 16 days ago • 189

upvoted a collection 27 days ago

SenseNova-U1

Collection

SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-Unify Architecture • 9 items • Updated about 11 hours ago • 67

upvoted a paper about 1 month ago

Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond

Paper • 2604.22748 • Published Apr 24 • 227

upvoted 2 papers about 2 months ago

Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding

Paper • 2604.05015 • Published Apr 6 • 236

LongCat-Next: Lexicalizing Modalities as Discrete Tokens

Paper • 2603.27538 • Published Mar 29 • 147

upvoted 5 papers 2 months ago

Bridging Semantic and Kinematic Conditions with Diffusion-based Discrete Motion Tokenizer

Paper • 2603.19227 • Published Mar 19 • 42

MonoArt: Progressive Structural Reasoning for Monocular Articulated 3D Reconstruction

Paper • 2603.19231 • Published Mar 19 • 36

Kinema4D: Kinematic 4D World Modeling for Spatiotemporal Embodied Simulation

Paper • 2603.16669 • Published Mar 17 • 70

Demystifing Video Reasoning

Paper • 2603.16870 • Published Mar 17 • 372

HSImul3R: Physics-in-the-Loop Reconstruction of Simulation-Ready Human-Scene Interactions

Paper • 2603.15612 • Published Mar 16 • 153

upvoted an article 3 months ago

Article

NEO-unify: Building Native Multimodal Unified Models End to End

sensenova

•

Mar 5

• 163

upvoted 4 papers 3 months ago

Haiwen Diao

AI & ML interests

Recent Activity

Organizations

Paranioar's activity

NEO-unify: Building Native Multimodal Unified Models End to End