MegaScience: Pushing the Frontiers of Post-Training Datasets for Science Reasoning Paper • 2507.16812 • Published Jul 22 • 63
Ultra-FineWeb: Efficient Data Filtering and Verification for High-Quality LLM Training Data Paper • 2505.05427 • Published May 8 • 4