MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark Paper • 2406.01574 • Published Jun 3, 2024 • 45
Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF Paper • 2405.19320 • Published May 29, 2024 • 10