Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
RLHFlow
's Collections
RLHFlow MATH Process Reward Model
Standard-format-preference-dataset
Mixture-of-preference-reward-modeling
RM-Bradley-Terry
PM-pair
Online RLHF
RLHFLow Reward Models
SFT Models
Mixture-of-preference-reward-modeling
updated
Apr 29
The mixture of preference datasets used for reward modeling.
Upvote
2
hendrydong/preference_700K
Viewer
•
Updated
Sep 28
•
700k
•
994
•
7
weqweasdas/preference_dataset_mixture2_and_safe_pku
Viewer
•
Updated
Apr 29
•
555k
•
49
•
9
Upvote
2
Share collection
View history
Collection guide
Browse collections