Mantis: A Versatile Vision-Language-Action Model with Disentangled Visual Foresight Paper • 2511.16175 • Published 27 days ago • 12
LoHoVLA: A Unified Vision-Language-Action Model for Long-Horizon Embodied Tasks Paper • 2506.00411 • Published May 31 • 31