DPO STAR for math - a NeoByBy Collection

NeoByBy 's Collections

DPO STAR for math

DPO STAR for math

updated Jul 31

Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning

Paper • 2407.18248 • Published Jul 25 • 31