CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution Paper • 2410.16256 • Published 19 days ago • 58 • 2
ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs Paper • 2410.12405 • Published 25 days ago • 13 • 2
HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models Paper • 2409.16191 • Published Sep 24 • 41 • 4