Unified Diffusion VLA: Vision-Language-Action Model via Joint Discrete Denoising Diffusion Process Paper • 2511.01718 • Published 2 days ago • 6
World Simulation with Video Foundation Models for Physical AI Paper • 2511.00062 • Published 8 days ago • 27
π_RL: Online RL Fine-tuning for Flow-based Vision-Language-Action Models Paper • 2510.25889 • Published 7 days ago • 55
Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning Paper • 2510.27606 • Published 5 days ago • 25
Revisiting Multimodal Positional Encoding in Vision-Language Models Paper • 2510.23095 • Published 10 days ago • 15
Dual-Stream Diffusion for World-Model Augmented Vision-Language-Action Model Paper • 2510.27607 • Published 5 days ago • 8
Limits of Generalization in RLVR: Two Case Studies in Mathematical Reasoning Paper • 2510.27044 • Published 6 days ago • 3
AMO-Bench: Large Language Models Still Struggle in High School Math Competitions Paper • 2510.26768 • Published 6 days ago • 33
Rethinking Driving World Model as Synthetic Data Generator for Perception Tasks Paper • 2510.19195 • Published 15 days ago • 10
Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning Paper • 2510.25992 • Published 7 days ago • 40
Kimi Linear: An Expressive, Efficient Attention Architecture Paper • 2510.26692 • Published 6 days ago • 93
LightBagel: A Light-weighted, Double Fusion Framework for Unified Multimodal Understanding and Generation Paper • 2510.22946 • Published 10 days ago • 16
Omni-Reward: Towards Generalist Omni-Modal Reward Modeling with Free-Form Preferences Paper • 2510.23451 • Published 9 days ago • 26
view article Article KV Caching Explained: Optimizing Transformer Inference Efficiency By not-lain • Jan 30 • 159