M-RewardBench: Evaluating Reward Models in Multilingual Settings Paper • 2410.15522 • Published Oct 20 • 11
M-RewardBench: Evaluating Reward Models in Multilingual Settings Paper • 2410.15522 • Published Oct 20 • 11
Curry-DPO: Enhancing Alignment using Curriculum Learning & Ranked Preferences Paper • 2403.07230 • Published Mar 12
M2Lingual: Enhancing Multilingual, Multi-Turn Instruction Alignment in Large Language Models Paper • 2406.16783 • Published Jun 24 • 4