MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark Paper • 2406.01574 • Published Jun 3 • 43
Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF Paper • 2405.19320 • Published May 29 • 9