Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM Paper • 2401.02994 • Published Jan 4 • 48
Repeat After Me: Transformers are Better than State Space Models at Copying Paper • 2402.01032 • Published Feb 1 • 22
Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks Paper • 2402.04248 • Published Feb 6 • 30
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality Paper • 2405.21060 • Published May 31 • 63
Block Transformer: Global-to-Local Language Modeling for Fast Inference Paper • 2406.02657 • Published Jun 4 • 37
Learning to (Learn at Test Time): RNNs with Expressive Hidden States Paper • 2407.04620 • Published Jul 5 • 27
A Comprehensive Survey of Mamba Architectures for Medical Image Analysis: Classification, Segmentation, Restoration and Beyond Paper • 2410.02362 • Published Oct 3 • 16
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA Paper • 2410.20672 • Published 25 days ago • 5
SambaMixer: State of Health Prediction of Li-ion Batteries using Mamba State Space Models Paper • 2411.00233 • Published 21 days ago • 7