Sparse Finetuning for Inference Acceleration of Large Language Models Paper • 2310.06927 • Published Oct 10, 2023 • 14
Towards End-to-end 4-Bit Inference on Generative Large Language Models Paper • 2310.09259 • Published Oct 13, 2023 • 1