Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection
Abstract
Automatic detection and prevention of open-set failures are crucial in closed-loop robotic systems. Recent studies often struggle to simultaneously identify unexpected failures reactively after they occur and prevent foreseeable ones proactively. To this end, we propose Code-as-Monitor (CaM), a novel paradigm leveraging the vision-language model (VLM) for both open-set reactive and proactive failure detection. The core of our method is to formulate both tasks as a unified set of spatio-temporal constraint satisfaction problems and use VLM-generated code to evaluate them for real-time monitoring. To enhance the accuracy and efficiency of monitoring, we further introduce constraint elements that abstract constraint-related entities or their parts into compact geometric elements. This approach offers greater generality, simplifies tracking, and facilitates constraint-aware visual programming by leveraging these elements as visual prompts. Experiments show that CaM achieves a 28.7% higher success rate and reduces execution time by 31.8% under severe disturbances compared to baselines across three simulators and a real-world setting. Moreover, CaM can be integrated with open-loop control policies to form closed-loop systems, enabling long-horizon tasks in cluttered scenes with dynamic environments.
Community
🔥Code-as-Monitor🔥
We present Code-as-Monitor, a novel paradigm leveraging the VLMs for failure detection.
Highlights:
- Code-as-Monitor is the first framework to integrate both reactive and proactive failure detection.
- Code-as-Monitor leverages the proposed constraint elements to simplify real-time failure detection with high precision.
- Code-as-Monitor achieves state-of-the-art (SOTA) performance in both simulated and real-world environments, and exhibits strong generalizability on unseen scenarios, tasks, and objects.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Addressing Failures in Robotics using Vision-Based Language Models (VLMs) and Behavior Trees (BT) (2024)
- ROCKET-1: Mastering Open-World Interaction with Visual-Temporal Context Prompting (2024)
- DaDu-E: Rethinking the Role of Large Language Model in Robotic Computing Pipeline (2024)
- ConceptAgent: LLM-Driven Precondition Grounding and Tree Search for Robust Task Planning and Execution (2024)
- RoboFail: Analyzing Failures in Robot Learning Policies (2024)
- Time is on my sight: scene graph filtering for dynamic environment perception in an LLM-driven robot (2024)
- Exploring the Adversarial Vulnerabilities of Vision-Language-Action Models in Robotics (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper