Equilibrium Matching: Generative Modeling with Implicit Energy-Based Models Paper • 2510.02300 • Published 20 days ago • 5
Less is More: Recursive Reasoning with Tiny Networks Paper • 2510.04871 • Published 16 days ago • 430
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens Paper • 2508.01191 • Published Aug 2 • 236
Seq vs Seq: An Open Suite of Paired Encoders and Decoders Paper • 2507.11412 • Published Jul 15 • 28
Dynamic Chunking for End-to-End Hierarchical Sequence Modeling Paper • 2507.07955 • Published Jul 10 • 25
Should We Still Pretrain Encoders with Masked Language Modeling? Paper • 2507.00994 • Published Jul 1 • 78
Energy-Based Transformers are Scalable Learners and Thinkers Paper • 2507.02092 • Published Jul 2 • 68
LongLLaDA: Unlocking Long Context Capabilities in Diffusion LLMs Paper • 2506.14429 • Published Jun 17 • 44
RADLADS: Rapid Attention Distillation to Linear Attention Decoders at Scale Paper • 2505.03005 • Published May 5 • 36
Implicit Language Models are RNNs: Balancing Parallelization and Expressivity Paper • 2502.07827 • Published Feb 10 • 1
Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought Paper • 2504.05599 • Published Apr 8 • 85
OmniSVG: A Unified Scalable Vector Graphics Generation Model Paper • 2504.06263 • Published Apr 8 • 179
RWKV-7 "Goose" with Expressive Dynamic State Evolution Paper • 2503.14456 • Published Mar 18 • 152
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning Paper • 2503.07572 • Published Mar 10 • 46
Forgetting Transformer: Softmax Attention with a Forget Gate Paper • 2503.02130 • Published Mar 3 • 32
EuroBERT: Scaling Multilingual Encoders for European Languages Paper • 2503.05500 • Published Mar 7 • 79