JudgeBench: A Benchmark for Evaluating LLM-based Judges Paper • 2410.12784 • Published Oct 16, 2024 • 44
Re-Tuning: Overcoming the Compositionality Limits of Large Language Models with Recursive Tuning Paper • 2407.04787 • Published Jul 5, 2024
JudgeBench: A Benchmark for Evaluating LLM-based Judges Paper • 2410.12784 • Published Oct 16, 2024 • 44
Agent Instructs Large Language Models to be General Zero-Shot Reasoners Paper • 2310.03710 • Published Oct 5, 2023 • 2