TGPO: Temporal Grounded Policy Optimization for Signal Temporal Logic Tasks
Abstract
TGPO, a Temporal Grounded Policy Optimization framework, decomposes STL tasks into subgoals and uses a hierarchical approach with dense rewards to improve task success rates in complex, long-horizon robotics tasks.
Learning control policies for complex, long-horizon tasks is a central challenge in robotics and autonomous systems. Signal Temporal Logic (STL) offers a powerful and expressive language for specifying such tasks, but its non-Markovian nature and inherent sparse reward make it difficult to be solved via standard Reinforcement Learning (RL) algorithms. Prior RL approaches focus only on limited STL fragments or use STL robustness scores as sparse terminal rewards. In this paper, we propose TGPO, Temporal Grounded Policy Optimization, to solve general STL tasks. TGPO decomposes STL into timed subgoals and invariant constraints and provides a hierarchical framework to tackle the problem. The high-level component of TGPO proposes concrete time allocations for these subgoals, and the low-level time-conditioned policy learns to achieve the sequenced subgoals using a dense, stage-wise reward signal. During inference, we sample various time allocations and select the most promising assignment for the policy network to rollout the solution trajectory. To foster efficient policy learning for complex STL with multiple subgoals, we leverage the learned critic to guide the high-level temporal search via Metropolis-Hastings sampling, focusing exploration on temporally feasible solutions. We conduct experiments on five environments, ranging from low-dimensional navigation to manipulation, drone, and quadrupedal locomotion. Under a wide range of STL tasks, TGPO significantly outperforms state-of-the-art baselines (especially for high-dimensional and long-horizon cases), with an average of 31.6% improvement in task success rate compared to the best baseline. The code will be available at https://github.com/mengyuest/TGPO
Community
Propose a new Reinforcement Learning approach to solve Signal Temporal Logic tasks.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Fully Learnable Neural Reward Machines (2025)
- Bridging Perception and Planning: Towards End-to-End Planning for Signal Temporal Logic Tasks (2025)
- Constrained Decoding for Robotics Foundation Models (2025)
- LLM-Guided Task- and Affordance-Level Exploration in Reinforcement Learning (2025)
- Reinforcement Learning with Anticipation: A Hierarchical Approach for Long-Horizon Tasks (2025)
- ReLAM: Learning Anticipation Model for Rewarding Visual Robotic Manipulation (2025)
- Goal-Guided Efficient Exploration via Large Language Model in Reinforcement Learning (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper