Multi-Turn Puzzles: Evaluating Interactive Reasoning and Strategic Dialogue in LLMs Paper • 2508.10142 • Published Aug 13 • 3
Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning Paper • 2406.09170 • Published Jun 13, 2024 • 27