Overview

The bitsandbytes.functional API provides the low-level building blocks for the library’s features.

When to Use bitsandbytes.functional

When you need direct control over quantized operations and their parameters.
To build custom layers or operations leveraging low-bit arithmetic.
To integrate with other ecosystem tooling.
For experimental or research purposes requiring non-standard quantization or performance optimizations.

LLM.int8()

bitsandbytes.functional.int8_double_quant

( A: Tensor col_stats: typing.Optional[torch.Tensor] = None row_stats: typing.Optional[torch.Tensor] = None out_col: typing.Optional[torch.Tensor] = None out_row: typing.Optional[torch.Tensor] = None threshold = 0.0 ) → Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor, Optional[torch.Tensor]]

Parameters

A (torch.Tensor with dtype torch.float16) — The input matrix.
col_stats (torch.Tensor, optional) — A pre-allocated tensor to hold the column-wise quantization scales.
row_stats (torch.Tensor, optional) — A pre-allocated tensor to hold the row-wise quantization scales.
out_col (torch.Tensor, optional) — A pre-allocated tensor to hold the column-wise quantized data.
out_row (torch.Tensor, optional) — A pre-allocated tensor to hold the row-wise quantized data.
threshold (float, optional) — An optional threshold for sparse decomposition of outlier features.

No outliers are held back when 0.0. Defaults to 0.0.

Returns

Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor, Optional[torch.Tensor]]

A tuple containing the quantized tensor and relevant statistics.

torch.Tensor with dtype torch.int8: The row-wise quantized data.
torch.Tensor with dtype torch.int8: The column-wise quantized data.
torch.Tensor with dtype torch.float32: The row-wise quantization scales.
torch.Tensor with dtype torch.float32: The column-wise quantization scales.
torch.Tensor with dtype torch.int32, optional: A list of column indices which contain outlier features.

Determine the quantization statistics for input matrix A in accordance to the LLM.int8() algorithm.

The statistics are determined both row-wise and column-wise (transposed).

For more information, see the LLM.int8() paper.

This function is useful for training, but for inference it is advised to use `int8_vectorwise_quant` instead. This implementation performs additional column-wise transposed calculations which are not optimized.

Bitsandbytes

Overview

When to Use bitsandbytes.functional

LLM.int8()

bitsandbytes.functional.int8_double_quant

bitsandbytes.functional.int8_linear_matmul

bitsandbytes.functional.int8_mm_dequant

bitsandbytes.functional.int8_vectorwise_dequant

bitsandbytes.functional.int8_vectorwise_quant

4-bit

bitsandbytes.functional.dequantize_4bit

bitsandbytes.functional.dequantize_fp4

bitsandbytes.functional.dequantize_nf4

bitsandbytes.functional.gemv_4bit

bitsandbytes.functional.quantize_4bit

bitsandbytes.functional.quantize_fp4

bitsandbytes.functional.quantize_nf4

class bitsandbytes.functional.QuantState

as_dict

from_dict

Dynamic 8-bit Quantization

bitsandbytes.functional.dequantize_blockwise

bitsandbytes.functional.quantize_blockwise

Utility

bitsandbytes.functional.get_ptr

bitsandbytes.functional.is_on_gpu