4 15 8

Yanxiao Zhao

sdpkjc

https://sdpkjc.me

AI & ML interests

Reinforcement Learning

Recent Activity

updated a dataset about 2 months ago

TheFactoryX/edition_0001_Rowan-hellaswag-readymade

published a dataset about 2 months ago

TheFactoryX/edition_0001_Rowan-hellaswag-readymade

updated a dataset about 2 months ago

TheFactoryX/edition_0000_fancyzhx-ag_news-readymade

View all activity

Organizations

upvoted 2 papers 4 months ago

SATQuest: A Verifier for Logical Reasoning Evaluation and Reinforcement Fine-Tuning of LLMs

Paper • 2509.00930 • Published Aug 31 • 4

ComputerRL: Scaling End-to-End Online Reinforcement Learning for Computer Use Agents

Paper • 2508.14040 • Published Aug 19 • 3

upvoted a collection 8 months ago

SATQuest

Collection

SATQuest Dataset Collections • 3 items • Updated Sep 4 • 1

upvoted an article 10 months ago

Article

Open R1: Update #3

Mar 11

•

296

upvoted a collection about 1 year ago

LLM Reasoning Papers

Collection

Papers to improve reasoning capabilities of LLMs • 20 items • Updated Jan 15 • 123

upvoted 3 papers about 1 year ago

A Simple and Provable Scaling Law for the Test-Time Compute of Large Language Models

Paper • 2411.19477 • Published Nov 29, 2024 • 6

The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization

Paper • 2403.17031 • Published Mar 24, 2024 • 6

Robot Utility Models: General Policies for Zero-Shot Deployment in New Environments

Paper • 2409.05865 • Published Sep 9, 2024 • 15

upvoted 5 papers over 1 year ago

Diffusion Policy Policy Optimization

Paper • 2409.00588 • Published Sep 1, 2024 • 20

D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning

Paper • 2408.08441 • Published Aug 15, 2024 • 8

On the Transformations across Reward Model, Parameter Update, and In-Context Prompt

Paper • 2406.16377 • Published Jun 24, 2024 • 13

FreeTraj: Tuning-Free Trajectory Control in Video Diffusion Models

Paper • 2406.16863 • Published Jun 24, 2024 • 11

TextGrad: Automatic "Differentiation" via Text

Paper • 2406.07496 • Published Jun 11, 2024 • 31

upvoted 2 papers almost 2 years ago

Open RL Benchmark: Comprehensive Tracked Experiments for Reinforcement Learning

Paper • 2402.03046 • Published Feb 5, 2024 • 7

Snapshot Reinforcement Learning: Leveraging Prior Trajectories for Efficiency

Paper • 2403.00673 • Published Mar 1, 2024 • 1

Yanxiao Zhao

AI & ML interests

Recent Activity

Organizations

sdpkjc's activity

Open R1: Update #3