Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
RLHFlow
's Collections
RLHFlow MATH Process Reward Model
Standard-format-preference-dataset
Mixture-of-preference-reward-modeling
RM-Bradley-Terry
PM-pair
Online RLHF
RLHFLow Reward Models
SFT Models
PM-pair
updated
May 10
This is a collection of materials for training pairwise preference model.
Upvote
2
RLHFlow/pair-preference-dataset-mix1
Viewer
•
Updated
May 6
•
548k
•
43
•
3
RLHFlow/pair-preference-model-LLaMA3-8B
Text Generation
•
Updated
Oct 14
•
2.04k
•
36
RLHFlow/pair_preference_model_dataset
Viewer
•
Updated
Apr 20
•
699k
•
64
•
4
Upvote
2
Share collection
View history
Collection guide
Browse collections