Fantastic Pretraining Optimizers and Where to Find Them Paper β’ 2509.02046 β’ Published 20 days ago β’ 12 β’ 1
BeyondWeb: Lessons from Scaling Synthetic Data for Trillion-scale Pretraining Paper β’ 2508.10975 β’ Published Aug 14 β’ 59 β’ 2
Step-3 is Large yet Affordable: Model-system Co-design for Cost-effective Decoding Paper β’ 2507.19427 β’ Published Jul 25 β’ 18 β’ 2