Training Large Language Models to Reason in a Continuous Latent Space Paper • 2412.06769 • Published 15 days ago • 61
ProcessBench: Identifying Process Errors in Mathematical Reasoning Paper • 2412.06559 • Published 15 days ago • 68
Gated Delta Networks: Improving Mamba2 with Delta Rule Paper • 2412.06464 • Published 15 days ago • 9
Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding Paper • 2411.04282 • Published Nov 6 • 30
GoldFinch: High Performance RWKV/Transformer Hybrid with Linear Pre-Fill and Extreme KV-Cache Compression Paper • 2407.12077 • Published Jul 16 • 54
In-Context Pretraining: Language Modeling Beyond Document Boundaries Paper • 2310.10638 • Published Oct 16, 2023 • 29
Sparser is Faster and Less is More: Efficient Sparse Attention for Long-Range Transformers Paper • 2406.16747 • Published Jun 24 • 18
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence Paper • 2404.05892 • Published Apr 8 • 32