SANA 1.5: Efficient Scaling of Training-Time and Inference-Time Compute in Linear Diffusion Transformer Paper • 2501.18427 • Published 19 days ago • 16
Reasoning Datasets Collection Reasoning datasets that are trending 🔥 • 10 items • Updated Jan 3 • 24
FalconMamba 7B Collection This collection features the FalconMamba 7B base model, the instruction-tuned version, their 4-bit and GGUF variants, and the demo. • 15 items • Updated 5 days ago • 33
view article Article Welcome FalconMamba: The first strong attention-free 7B model Aug 12, 2024 • 108
Data Mixture Inference: What do BPE Tokenizers Reveal about their Training Data? Paper • 2407.16607 • Published Jul 23, 2024 • 23
Jamba: A Hybrid Transformer-Mamba Language Model Paper • 2403.19887 • Published Mar 28, 2024 • 107
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection Paper • 2403.03507 • Published Mar 6, 2024 • 185
💫 StarCoder2 Collection StarCoder2 models and datasets! • 8 items • Updated Mar 1, 2024 • 83
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Paper • 2402.17764 • Published Feb 27, 2024 • 608
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases Paper • 2402.14905 • Published Feb 22, 2024 • 127
Mixtures of Experts Unlock Parameter Scaling for Deep RL Paper • 2402.08609 • Published Feb 13, 2024 • 36
StrokeNUWA: Tokenizing Strokes for Vector Graphic Synthesis Paper • 2401.17093 • Published Jan 30, 2024 • 20
ChatQA: Building GPT-4 Level Conversational QA Models Paper • 2401.10225 • Published Jan 18, 2024 • 36
AST-T5: Structure-Aware Pretraining for Code Generation and Understanding Paper • 2401.03003 • Published Jan 5, 2024 • 13
PanGu-π: Enhancing Language Model Architectures via Nonlinearity Compensation Paper • 2312.17276 • Published Dec 27, 2023 • 16