Bootstrapping Exploration with Group-Level Natural Language Feedback in Reinforcement Learning Paper • 2603.04597 • Published 29 days ago • 210 • 4
Generation Models Know Space: Unleashing Implicit 3D Priors for Scene Understanding Paper • 2603.19235 • Published 14 days ago • 94 • 5
SAMA: Factorized Semantic Anchoring and Motion Alignment for Instruction-Guided Video Editing Paper • 2603.19228 • Published 14 days ago • 67 • 4
HopChain: Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoning Paper • 2603.17024 • Published 16 days ago • 106 • 5
Astrolabe: Steering Forward-Process Reinforcement Learning for Distilled Autoregressive Video Models Paper • 2603.17051 • Published 16 days ago • 106 • 7
Omni-WorldBench: Towards a Comprehensive Interaction-Centric Evaluation for World Models Paper • 2603.22212 • Published 10 days ago • 124 • 10
PixelSmile: Toward Fine-Grained Facial Expression Editing Paper • 2603.25728 • Published 7 days ago • 116 • 4
FIPO: Eliciting Deep Reasoning with Future-KL Influenced Policy Optimization Paper • 2603.19835 • Published 14 days ago • 304 • 7
TAPS: Task Aware Proposal Distributions for Speculative Sampling Paper • 2603.27027 • Published 6 days ago • 137 • 4
WildWorld: A Large-Scale Dataset for Dynamic World Modeling with Actions and Explicit State toward Generative ARPG Paper • 2603.23497 • Published 9 days ago • 90 • 4
Calibri: Enhancing Diffusion Transformers via Parameter-Efficient Calibration Paper • 2603.24800 • Published 8 days ago • 65 • 5
Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models Paper • 2603.25716 • Published 7 days ago • 147 • 4
ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling Paper • 2603.25746 • Published 7 days ago • 150 • 6
Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Model Paper • 2603.21986 • Published 10 days ago • 120 • 6
LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels Paper • 2603.19312 • Published 20 days ago • 18 • 2
Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale Paper • 2603.25040 • Published 8 days ago • 124 • 9
Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale Paper • 2603.25040 • Published 8 days ago • 124 • 9