quant dissable second stage compression!
#17
by
zdxpan
- opened
new_layer = bnb.nn.Linear4bit(
in_features,
out_features,
bias=has_bias,
compute_dtype=bnb_4bit_compute_dtype,
compress_statistics=False, # control if quantinize as double quantization
quant_type=quant_type
)
# quantize happens here
new_layer.load_state_dict(child.state_dict())
new_layer = new_layer.to(device)
see the comment compress_statistics. as False
from transformers import BitsAndBytesConfig
double_quant_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
)
i tried it for diffusers and it work ,, How i could make it for comfyui and stable "safetensores " could you help me ?