view article Article 🤗 PEFT: Parameter-Efficient Fine-Tuning of Billion-Scale Models on Low-Resource Hardware Feb 10, 2023 • 106
SLiC-HF: Sequence Likelihood Calibration with Human Feedback Paper • 2305.10425 • Published May 17, 2023 • 6
Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning Paper • 2508.08221 • Published Aug 11 • 47
On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification Paper • 2508.05629 • Published Aug 7 • 177
panda-gym: Open-source goal-conditioned environments for robotic learning Paper • 2106.13687 • Published Jun 25, 2021 • 3
Open RL Benchmark: Comprehensive Tracked Experiments for Reinforcement Learning Paper • 2402.03046 • Published Feb 5, 2024 • 7
Distributional Preference Alignment of LLMs via Optimal Transport Paper • 2406.05882 • Published Jun 9, 2024 • 2
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models Paper • 2508.06471 • Published Aug 8 • 186
EMA Without the Lag: Bias-Corrected Iterate Averaging Schemes Paper • 2508.00180 • Published Jul 31 • 1
gpt-oss Collection Open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases. • 2 items • Updated Aug 7 • 361