MixLLM: LLM Quantization with Global Mixed-precision between Output-features and Highly-efficient System Design Paper • 2412.14590 • Published 18 days ago • 13
FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design Paper • 2401.14112 • Published Jan 25, 2024 • 18