Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
sail
's Collections
Precision-RL
π Active PRM
πΎOat-Zero: Understanding R1-Zero-Like Training
π± Sailor2 Language Models
𧬠RegMix: Data Mixture as Regression
π Scaling Laws with Vocabulary
π‘ DICE
βοΈ Sailor Language Models
Precision-RL
updated
19 days ago
Defeating the Training-Inference Mismatch via FP16
Upvote
-
Defeating the Training-Inference Mismatch via FP16
Paper
β’
2510.26788
β’
Published
Oct 30
β’
29
sail/Sanity-Test-R1D-1.5B
Viewer
β’
Updated
18 days ago
β’
1.52k
β’
96
β’
6
Upvote
-
Share collection
View history
Collection guide
Browse collections