X-Part: high fidelity and structure coherent shape decomposition Paper • 2509.08643 • Published about 1 month ago • 26
FLUX-Reason-6M & PRISM-Bench: A Million-Scale Text-to-Image Reasoning Dataset and Comprehensive Benchmark Paper • 2509.09680 • Published 30 days ago • 42
Snap-Snap: Taking Two Images to Reconstruct 3D Human Gaussians in Milliseconds Paper • 2508.14892 • Published Aug 20 • 9
"Does the cafe entrance look accessible? Where is the door?" Towards Geospatial AI Agents for Visual Inquiries Paper • 2508.15752 • Published Aug 21 • 7
Visual Autoregressive Modeling for Instruction-Guided Image Editing Paper • 2508.15772 • Published Aug 21 • 9
ATLAS: Decoupling Skeletal and Shape Parameters for Expressive Parametric Human Modeling Paper • 2508.15767 • Published Aug 21 • 16
SceneGen: Single-Image 3D Scene Generation in One Feedforward Pass Paper • 2508.15769 • Published Aug 21 • 19
LiveMCP-101: Stress Testing and Diagnosing MCP-enabled Agents on Challenging Queries Paper • 2508.15760 • Published Aug 21 • 46
Representing Speech Through Autoregressive Prediction of Cochlear Tokens Paper • 2508.11598 • Published Aug 15 • 17
S^2-Guidance: Stochastic Self Guidance for Training-Free Enhancement of Diffusion Models Paper • 2508.12880 • Published Aug 18 • 45
Inverse-LLaVA: Eliminating Alignment Pre-training Through Text-to-Vision Mapping Paper • 2508.12466 • Published Aug 17 • 8
Matrix-Game 2.0: An Open-Source, Real-Time, and Streaming Interactive World Model Paper • 2508.13009 • Published Aug 18 • 25
ComoRAG: A Cognitive-Inspired Memory-Organized RAG for Stateful Long Narrative Reasoning Paper • 2508.10419 • Published Aug 14 • 73