Draft Model Knows When to Stop: A Self-Verification Length Policy for Speculative Decoding Paper • 2411.18462 • Published 21 days ago • 6
Low-Bit Quantization Favors Undertrained LLMs: Scaling Laws for Quantized LLMs with 100T Training Tokens Paper • 2411.17691 • Published 22 days ago • 9
Star Attention: Efficient LLM Inference over Long Sequences Paper • 2411.17116 • Published 22 days ago • 45