The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization Paper • 2403.17031 • Published Mar 24 • 2
Robot Utility Models: General Policies for Zero-Shot Deployment in New Environments Paper • 2409.05865 • Published Sep 9 • 14
D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning Paper • 2408.08441 • Published Aug 15 • 6
On the Transformations across Reward Model, Parameter Update, and In-Context Prompt Paper • 2406.16377 • Published Jun 24 • 11
FreeTraj: Tuning-Free Trajectory Control in Video Diffusion Models Paper • 2406.16863 • Published Jun 24 • 10
Open RL Benchmark: Comprehensive Tracked Experiments for Reinforcement Learning Paper • 2402.03046 • Published Feb 5 • 6
Snapshot Reinforcement Learning: Leveraging Prior Trajectories for Efficiency Paper • 2403.00673 • Published Mar 1 • 1