Submitted by Luo2003 51 RPG: A Repository Planning Graph for Unified and Scalable Codebase Generation · 14 authors 4
Submitted by taesiri 27 MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer · 27 authors 1
Submitted by fjxmlzn 19 Latent Zoning Network: A Unified Principle for Generative Modeling, Representation Learning, and Classification · 6 authors 14 1
Submitted by yifanzhang114 14 BaseReward: A Strong Baseline for Multimodal Reward Model · 15 authors 1
Submitted by taesiri 4 A Vision-Language-Action-Critic Model for Robotic Real-World Reinforcement Learning · 10 authors 108 1
Submitted by fangli3 2 RGB-Only Supervised Camera Parameter Optimization in Dynamic Scenes · 3 authors 1
Submitted by dlion168 2 Do You Hear What I Mean? Quantifying the Instruction-Perception Gap in Instruction-Guided Expressive Text-To-Speech Systems · 5 authors 1
Submitted by taesiri 1 Video2Roleplay: A Multimodal Dataset and Framework for Video-Guided Role-playing Agents · 7 authors 1
Submitted by tetrisd 1 WhisTLE: Deeply Supervised, Text-Only Domain Adaptation for Pretrained Speech Recognition Transformers · 3 authors 1
Submitted by leolin9248 - Ask-to-Clarify: Resolving Instruction Ambiguity through Multi-turn Dialogue · 8 authors 2