M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models Paper • 2504.10449 • Published Apr 14, 2025 • 15
Towards Best Practices for Open Datasets for LLM Training Paper • 2501.08365 • Published Jan 14, 2025 • 62
METAGENE-1: Metagenomic Foundation Model for Pandemic Monitoring Paper • 2501.02045 • Published Jan 3, 2025 • 22
RedPajama: an Open Dataset for Training Large Language Models Paper • 2411.12372 • Published Nov 19, 2024 • 56
RedPajama: an Open Dataset for Training Large Language Models Paper • 2411.12372 • Published Nov 19, 2024 • 56
The Mamba in the Llama: Distilling and Accelerating Hybrid Models Paper • 2408.15237 • Published Aug 27, 2024 • 42
NeuralArTS: Structuring Neural Architecture Search with Type Theory Paper • 2110.08710 • Published Oct 17, 2021
Towards One Shot Search Space Poisoning in Neural Architecture Search Paper • 2111.07138 • Published Nov 13, 2021
SEE-2-SOUND: Zero-Shot Spatial Environment-to-Spatial Sound Paper • 2406.06612 • Published Jun 6, 2024 • 16
Linguistic Collapse: Neural Collapse in (Large) Language Models Paper • 2405.17767 • Published May 28, 2024
Mixture-of-Agents Enhances Large Language Model Capabilities Paper • 2406.04692 • Published Jun 7, 2024 • 59
Mixture-of-Agents Enhances Large Language Model Capabilities Paper • 2406.04692 • Published Jun 7, 2024 • 59
Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order Paper • 2404.00399 • Published Mar 30, 2024 • 42
FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design Paper • 2401.14112 • Published Jan 25, 2024 • 20
PDFTriage: Question Answering over Long, Structured Documents Paper • 2309.08872 • Published Sep 16, 2023 • 53
DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales Paper • 2308.01320 • Published Aug 2, 2023 • 45