Agent0-VL: Exploring Self-Evolving Agent for Tool-Integrated Vision-Language Reasoning Paper • 2511.19900 • Published 10 days ago • 46
PhysX-Anything: Simulation-Ready Physical 3D Assets from Single Image Paper • 2511.13648 • Published 17 days ago • 52
GigaWorld-0: World Models as Data Engine to Empower Embodied AI Paper • 2511.19861 • Published 10 days ago • 30
Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens Paper • 2511.19418 • Published 10 days ago • 26
In-Video Instructions: Visual Signals as Generative Control Paper • 2511.19401 • Published 10 days ago • 29
Computer-Use Agents as Judges for Generative User Interface Paper • 2511.15567 • Published 15 days ago • 51
π_RL: Online RL Fine-tuning for Flow-based Vision-Language-Action Models Paper • 2510.25889 • Published Oct 29 • 64
The African Languages Lab: A Collaborative Approach to Advancing Low-Resource African NLP Paper • 2510.05644 • Published Oct 7 • 23
PhysToolBench: Benchmarking Physical Tool Understanding for MLLMs Paper • 2510.09507 • Published Oct 10 • 10
R-WoM: Retrieval-augmented World Model For Computer-use Agents Paper • 2510.11892 • Published Oct 13 • 21
Memory as Action: Autonomous Context Curation for Long-Horizon Agentic Tasks Paper • 2510.12635 • Published Oct 14 • 15
LIBERO-Plus: In-depth Robustness Analysis of Vision-Language-Action Models Paper • 2510.13626 • Published Oct 15 • 45
The German Commons - 154 Billion Tokens of Openly Licensed Text for German Language Models Paper • 2510.13996 • Published Oct 15 • 8