view article Article Transformers backend integration in SGLang By marcsun13 and 4 others • Jun 23 • 53
KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache Paper • 2402.02750 • Published Feb 5, 2024 • 4
Robust Tickets Can Transfer Better: Drawing More Transferable Subnetworks in Transfer Learning Paper • 2304.11834 • Published Apr 24, 2023 • 1
KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache Paper • 2402.02750 • Published Feb 5, 2024 • 4
KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches Paper • 2407.01527 • Published Jul 1, 2024
AutoL2S: Auto Long-Short Reasoning for Efficient Large Language Models Paper • 2505.22662 • Published May 28 • 6
Give Me FP32 or Give Me Death? Challenges and Solutions for Reproducible Reasoning Paper • 2506.09501 • Published Jun 11 • 18
AutoL2S: Auto Long-Short Reasoning for Efficient Large Language Models Paper • 2505.22662 • Published May 28 • 6
Give Me FP32 or Give Me Death? Challenges and Solutions for Reproducible Reasoning Paper • 2506.09501 • Published Jun 11 • 18 • 2
Give Me FP32 or Give Me Death? Challenges and Solutions for Reproducible Reasoning Paper • 2506.09501 • Published Jun 11 • 18
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models Paper • 2503.16419 • Published Mar 20 • 76
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models Paper • 2503.16419 • Published Mar 20 • 76