Hydragen: High-Throughput LLM Inference with Shared Prefixes Paper • 2402.05099 • Published Feb 7 • 18
Ouroboros: Speculative Decoding with Large Model Enhanced Drafting Paper • 2402.13720 • Published Feb 21 • 6
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention Paper • 2405.12981 • Published May 21 • 28
Addition is All You Need for Energy-efficient Language Models Paper • 2410.00907 • Published Oct 1 • 144