Online RLHF - a RLHFlow Collection

RLHFlow 's Collections

RLHFlow MATH Process Reward Model

Standard-format-preference-dataset

Mixture-of-preference-reward-modeling

RM-Bradley-Terry

PM-pair

RLHFLow Reward Models

Online RLHF

updated Jun 12, 2024

Datasets, code, and models for online RLHF (i.e., iterative DPO)

RLHFlow/prompt-collection-v0.1

Viewer • Updated May 8, 2024 • 179k • 36 • 9
RLHFlow/pair-preference-model-LLaMA3-8B

Text Generation • Updated Oct 14, 2024 • 2.46k • 38
sfairXC/FsfairX-LLaMA3-RM-v0.1

Text Classification • Updated Oct 14, 2024 • 5.24k • 53
RLHFlow/SFT-OpenHermes-2.5-Standard

Viewer • Updated Apr 24, 2024 • 1M • 35 • 2
RLHFlow/iterative-prompt-v1-iter2-20K

Viewer • Updated May 3, 2024 • 20k • 160 • 2
RLHFlow/iterative-prompt-v1-iter3-20K

Viewer • Updated May 3, 2024 • 20k • 145 • 3
RLHFlow/iterative-prompt-v1-iter1-20K

Viewer • Updated May 3, 2024 • 20k • 179 • 2
Salesforce/LLaMA-3-8B-SFR-Iterative-DPO-R

Text Generation • Updated Jun 12, 2024 • 93 • 77
RLHF Workflow: From Reward Modeling to Online RLHF

Paper • 2405.07863 • Published May 13, 2024 • 67
Salesforce/LLaMA-3-8B-SFR-SFT-R

Text Generation • Updated May 31, 2024 • 19 • 7
RLHFlow/LLaMA3-SFT

Text Generation • Updated Nov 3, 2024 • 7.39k • 9
RLHFlow/LLaMA3-iterative-DPO-final

Text Generation • Updated Oct 14, 2024 • 6.8k • 40
RLHFlow/iterative-prompt-v1-iter4-20K

Viewer • Updated Jun 12, 2024 • 20k • 153
RLHFlow/iterative-prompt-v1-iter5-20K

Viewer • Updated Jun 12, 2024 • 20k • 39
RLHFlow/iterative-prompt-v1-iter6-20K

Viewer • Updated Jun 12, 2024 • 20k • 84
RLHFlow/iterative-prompt-v1-iter7-20K

Viewer • Updated Jun 12, 2024 • 20k • 89
RLHFlow/iterative-prompt-v1-iter8-20K

Viewer • Updated Jun 12, 2024 • 20k • 85
RLHFlow/iterative-prompt-v1-iter9-20K

Viewer • Updated Jun 12, 2024 • 19.9k • 84 • 1