TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters Paper • 2410.23168 • Published 22 days ago • 22
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA Paper • 2410.20672 • Published 25 days ago • 5
Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss Paper • 2410.17243 • Published about 1 month ago • 88
MiniPLM: Knowledge Distillation for Pre-Training Language Models Paper • 2410.17215 • Published about 1 month ago • 12
GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models Paper • 2410.05229 • Published Oct 7 • 18
Eurus Collection Advancing LLM Reasoning Generalists with Preference Trees • 11 items • Updated about 1 month ago • 24
Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale Paper • 2409.17115 • Published Sep 25 • 59
InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning Paper • 2409.12568 • Published Sep 19 • 47
EzAudio: Enhancing Text-to-Audio Generation with Efficient Diffusion Transformer Paper • 2409.10819 • Published Sep 17 • 17
Gated Slot Attention for Efficient Linear-Time Sequence Modeling Paper • 2409.07146 • Published Sep 11 • 19
Arctic-SnowCoder: Demystifying High-Quality Data in Code Pretraining Paper • 2409.02326 • Published Sep 3 • 18
Longhorn: State Space Models are Amortized Online Learners Paper • 2407.14207 • Published Jul 19 • 17