Submitted by Seungone Kim 12 RefineBench: Evaluating Refinement Capability of Language Models via Checklists Carnegie Mellon University 11 2