M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models Paper • 2504.10449 • Published Apr 14 • 15
Tiny Language Model Datasets Collection Collection of Synthetic Datasets that can be used in pretraining of any the Tiny Language Model • 14 items • Updated about 5 hours ago • 29
Optimal Sparsity of Mixture-of-Experts Language Models for Reasoning Tasks Paper • 2508.18672 • Published 27 days ago • 10
Fantastic Pretraining Optimizers and Where to Find Them Paper • 2509.02046 • Published 20 days ago • 12
AWorld: Orchestrating the Training Recipe for Agentic AI Paper • 2508.20404 • Published 25 days ago • 38
view article Article Say hello to `hf`: a faster, friendlier Hugging Face CLI ✨ By Wauplin and 2 others • Jul 25 • 81
view article Article NVIDIA Releases 6 Million Multi-Lingual Reasoning Dataset By nvidia and 4 others • Aug 20 • 17
view article Article MCP for Research: How to Connect AI to Research Tools By dylanebert • Aug 18 • 54
BeyondWeb: Lessons from Scaling Synthetic Data for Trillion-scale Pretraining Paper • 2508.10975 • Published Aug 14 • 59
EXAONE 4.0: Unified Large Language Models Integrating Non-reasoning and Reasoning Modes Paper • 2507.11407 • Published Jul 15 • 57
view article Article NVIDIA Releases 3 Million Sample Dataset for OCR, Visual Question Answering, and Captioning Tasks By nvidia and 4 others • Aug 11 • 73
view article Article Welcome GPT OSS, the new open-source model family from OpenAI! By reach-vb and 11 others • Aug 5 • 495
view article Article retrain-pipelines and the almighty function-caller By Aurelien-Morgan • Apr 28 • 8
view article Article Introducing Command A Vision: Multimodal AI built for Business By CohereLabs and 3 others • Jul 31 • 63