Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
RLHFlow
's Collections
RLHFlow MATH Process Reward Model
Standard-format-preference-dataset
Mixture-of-preference-reward-modeling
RM-Bradley-Terry
PM-pair
Online RLHF
RLHFLow Reward Models
SFT Models
SFT Models
updated
22 days ago
We train a series of SFT models on the high-quality SFT dataset of RLHFlow for research purpose.
Upvote
1
RLHFlow/LLaMA3-SFT
Text Generation
•
Updated
22 days ago
•
6.1k
•
8
RLHFlow/RLHFlow-SFT-Dataset-ver2
Viewer
•
Updated
23 days ago
•
2.32M
•
78
•
3
RLHFlow/LLaMA3-SFT-v2
Text Generation
•
Updated
22 days ago
•
5.92k
RLHFlow/Llama3-SFT-v2.0-epoch1
Text Generation
•
Updated
22 days ago
•
7
RLHFlow/Llama3-SFT-v2.0-epoch2
Text Generation
•
Updated
22 days ago
•
12
RLHFlow/Llama3-SFT-v2.0-epoch3
Text Generation
•
Updated
22 days ago
•
9
Upvote
1
Share collection
View history
Collection guide
Browse collections