ROOT: Robust Orthogonalized Optimizer for Neural Network Training Paper • 2511.20626 • Published 11 days ago • 169
Kimi Linear: An Expressive, Efficient Attention Architecture Paper • 2510.26692 • Published Oct 30 • 114
Sparse Query Attention (SQA): A Computationally Efficient Attention Mechanism with Query Heads Reduction Paper • 2510.01817 • Published Oct 2 • 15
UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning Paper • 2508.18756 • Published Aug 26 • 36
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning Paper • 2505.17667 • Published May 23 • 88
Achieving Tokenizer Flexibility in Language Models through Heuristic Adaptation and Supertoken Learning Paper • 2505.09738 • Published May 14 • 10
IndicTTS Datasets Collection Datasets derived from the Indic TTS Database, a special corpus of Indian languages developed by the Speech Technology Consortium at IIT Madras. • 13 items • Updated Mar 6 • 12
Adapting Multilingual LLMs to Low-Resource Languages using Continued Pre-training and Synthetic Corpus Paper • 2410.14815 • Published Oct 18, 2024 • 1
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference Paper • 2412.13663 • Published Dec 18, 2024 • 158
ModernBERT Collection Bringing BERT into modernity via both architecture changes and scaling • 3 items • Updated Dec 19, 2024 • 151
Falcon3 Collection Falcon3 family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B parameters. • 40 items • Updated about 1 month ago • 87
MiniPLM Collection Pre-trained models in MiniPLM: Knowledge Distillation for Pre-Training Language Models • 5 items • Updated Oct 21, 2024 • 2
MiniPLM: Knowledge Distillation for Pre-Training Language Models Paper • 2410.17215 • Published Oct 22, 2024 • 16
Structured 3D Latents for Scalable and Versatile 3D Generation Paper • 2412.01506 • Published Dec 2, 2024 • 84
SmolLM2 Collection State-of-the-art compact LLMs for on-device applications: 1.7B, 360M, 135M • 16 items • Updated May 5 • 294