AdaPlanBench: Evaluating Adaptive Planning in Large Language Model Agents under World and User Constraints Paper • 2606.05622 • Published 4 days ago • 37
Ψ-Bench: Evaluating Persona-Sensitive Influencing in Persuasive Dialogues Paper • 2606.02754 • Published 6 days ago • 13
Ψ-Bench: Evaluating Persona-Sensitive Influencing in Persuasive Dialogues Paper • 2606.02754 • Published 6 days ago • 13
Advancing Creative Physical Intelligence in Large Multimodal Models Paper • 2605.26396 • Published 14 days ago • 19
CreativityBench: Evaluating Agent Creative Reasoning via Affordance-Based Tool Repurposing Paper • 2605.02910 • Published May 6 • 22
SafeSwitch: Steering Unsafe LLM Behavior via Internal Activation Signals Paper • 2502.01042 • Published Feb 3, 2025 • 1
Energy-Based Transformers are Scalable Learners and Thinkers Paper • 2507.02092 • Published Jul 2, 2025 • 70
EscapeBench: Towards Advancing Creative Intelligence of Language Model Agents Paper • 2412.13549 • Published Dec 18, 2024
Self-Aligned Reward: Towards Effective and Efficient Reasoners Paper • 2509.05489 • Published Sep 5, 2025 • 1
DRPG (Decompose, Retrieve, Plan, Generate): An Agentic Framework for Academic Rebuttal Paper • 2601.18081 • Published Jan 26 • 8
DRPG (Decompose, Retrieve, Plan, Generate): An Agentic Framework for Academic Rebuttal Paper • 2601.18081 • Published Jan 26 • 8
DRPG (Decompose, Retrieve, Plan, Generate): An Agentic Framework for Academic Rebuttal Paper • 2601.18081 • Published Jan 26 • 8