mdouglas
's Collections
Papers: Quantization
updated
FP8-LM: Training FP8 Large Language Models
Paper
•
2310.18313
•
Published
•
31
LLM-FP4: 4-Bit Floating-Point Quantized Transformers
Paper
•
2310.16836
•
Published
•
13
TEQ: Trainable Equivalent Transformation for Quantization of LLMs
Paper
•
2310.10944
•
Published
•
9
ModuLoRA: Finetuning 3-Bit LLMs on Consumer GPUs by Integrating with
Modular Quantizers
Paper
•
2309.16119
•
Published
•
1
AWQ: Activation-aware Weight Quantization for LLM Compression and
Acceleration
Paper
•
2306.00978
•
Published
•
8
LoRAPrune: Pruning Meets Low-Rank Parameter-Efficient Fine-Tuning
Paper
•
2305.18403
•
Published
•
2
SmoothQuant: Accurate and Efficient Post-Training Quantization for Large
Language Models
Paper
•
2211.10438
•
Published
•
4
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained
Transformers
Paper
•
2210.17323
•
Published
•
8
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
Paper
•
2208.07339
•
Published
•
4
Optimize Weight Rounding via Signed Gradient Descent for the
Quantization of LLMs
Paper
•
2309.05516
•
Published
•
9