Dynamic-TreeRPO: Breaking the Independent Trajectory Bottleneck with Structured Sampling
Paper
•
2509.23352
•
Published
•
3
None defined yet.
DSI-Bench: A Benchmark for Dynamic Spatial Intelligence
Attention Illuminates LLM Reasoning: The Preplan-and-Anchor Rhythm Enables Fine-Grained Policy Optimization