Papers: Quantization - a mdouglas Collection

mdouglas 's Collections

Datasets: NeurIPS LLM Challenge 2023

Papers

Papers: GEC/Revision

Papers: Instruct

Papers: MoE/Ensemble

Papers: Evaluation

Papers: Quantization

Papers: Pruning

Papers: LLM as a Judge

llm.c

Papers: Quantization

updated Apr 10

FP8-LM: Training FP8 Large Language Models

Paper • 2310.18313 • Published Oct 27, 2023 • 31
LLM-FP4: 4-Bit Floating-Point Quantized Transformers

Paper • 2310.16836 • Published Oct 25, 2023 • 13
TEQ: Trainable Equivalent Transformation for Quantization of LLMs

Paper • 2310.10944 • Published Oct 17, 2023 • 9
ModuLoRA: Finetuning 3-Bit LLMs on Consumer GPUs by Integrating with Modular Quantizers

Paper • 2309.16119 • Published Sep 28, 2023 • 1
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Paper • 2306.00978 • Published Jun 1, 2023 • 8
LoRAPrune: Pruning Meets Low-Rank Parameter-Efficient Fine-Tuning

Paper • 2305.18403 • Published May 28, 2023 • 2
SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models

Paper • 2211.10438 • Published Nov 18, 2022 • 4
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers

Paper • 2210.17323 • Published Oct 31, 2022 • 8
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale

Paper • 2208.07339 • Published Aug 15, 2022 • 4
Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs

Paper • 2309.05516 • Published Sep 11, 2023 • 9