Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up

RL from Synthetic Feedback

community
Activity Feed

AI & ML interests

None defined yet.

nathan lile's profile picture SynthLabs's profile picture Angel Raychev's profile picture

nlile 
authored a paper 6 months ago

Just Enough Thinking: Efficient Reasoning with Adaptive Length Penalties Reinforcement Learning

Paper • 2506.05256 • Published Jun 5 • 2
nlile 
authored 2 papers 9 months ago

Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models

Paper • 2502.17387 • Published Feb 24 • 7

Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs

Paper • 2503.01307 • Published Mar 3 • 38
nlile 
authored a paper 11 months ago

Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though

Paper • 2501.04682 • Published Jan 8 • 99
nlile 
authored a paper about 1 year ago

Generative Reward Models

Paper • 2410.12832 • Published Oct 2, 2024 • 7
nlile 
authored 2 papers over 1 year ago

PERSONA: A Reproducible Testbed for Pluralistic Alignment

Paper • 2407.17387 • Published Jul 24, 2024 • 20

Suppressing Pink Elephants with Direct Principle Feedback

Paper • 2402.07896 • Published Feb 12, 2024 • 11
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs