Cache-to-Cache: Direct Semantic Communication Between Large Language Models Paper • 2510.03215 • Published Oct 3 • 95
When Thoughts Meet Facts: Reusable Reasoning for Long-Context LMs Paper • 2510.07499 • Published about 1 month ago • 48
StreamingVLM: Real-Time Understanding for Infinite Video Streams Paper • 2510.09608 • Published 29 days ago • 49
LiteStage: Latency-aware Layer Skipping for Multi-stage Reasoning Paper • 2510.14211 • Published 23 days ago • 6
Every Attention Matters: An Efficient Hybrid Architecture for Long-Context Reasoning Paper • 2510.19338 • Published 17 days ago • 110
LightMem: Lightweight and Efficient Memory-Augmented Generation Paper • 2510.18866 • Published 18 days ago • 107
Glyph: Scaling Context Windows via Visual-Text Compression Paper • 2510.17800 • Published 19 days ago • 64
Latent Sketchpad: Sketching Visual Thoughts to Elicit Multimodal Reasoning in MLLMs Paper • 2510.24514 • Published 11 days ago • 20
The End of Manual Decoding: Towards Truly End-to-End Language Models Paper • 2510.26697 • Published 9 days ago • 113
Exploring Conditions for Diffusion models in Robotic Control Paper • 2510.15510 • Published 22 days ago • 39
Kimi Linear: An Expressive, Efficient Attention Architecture Paper • 2510.26692 • Published 9 days ago • 98
Delta Attention: Fast and Accurate Sparse Attention Inference by Delta Correction Paper • 2505.11254 • Published May 16 • 48