The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Paper • 2402.17764 • Published Feb 27 • 603
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity Paper • 2101.03961 • Published Jan 11, 2021 • 14
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models Paper • 2401.06066 • Published Jan 11 • 43