RefineBench: Evaluating Refinement Capability of Language Models via Checklists Paper • 2511.22173 • Published 7 days ago • 12
RefineBench: Evaluating Refinement Capability of Language Models via Checklists Paper • 2511.22173 • Published 7 days ago • 12
RefineBench: Evaluating Refinement Capability of Language Models via Checklists Paper • 2511.22173 • Published 7 days ago • 12 • 2
SPICE: Self-Play In Corpus Environments Improves Reasoning Paper • 2510.24684 • Published Oct 28 • 15
Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning Paper • 2507.00432 • Published Jul 1 • 79
Text2Grad: Reinforcement Learning from Natural Language Feedback Paper • 2505.22338 • Published May 28 • 8
Datasheets Aren't Enough: DataRubrics for Automated Quality Metrics and Accountability Paper • 2506.01789 • Published Jun 2 • 14
Datasheets Aren't Enough: DataRubrics for Automated Quality Metrics and Accountability Paper • 2506.01789 • Published Jun 2 • 14
FREESON: Retriever-Free Retrieval-Augmented Reasoning via Corpus-Traversing MCTS Paper • 2505.16409 • Published May 22 • 2