Optimizing Large Language Model Training Using FP4 Quantization Paper • 2501.17116 • Published 5 days ago • 26
Understanding and Mitigating Bottlenecks of State Space Models through the Lens of Recency and Over-smoothing Paper • 2501.00658 • Published Dec 31, 2024 • 7
Nested Attention: Semantic-aware Attention Values for Concept Personalization Paper • 2501.01407 • Published about 1 month ago • 11
Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published Dec 13, 2024 • 89
Active propulsion noise shaping for multi-rotor aircraft localization Paper • 2402.17289 • Published Feb 27, 2024