Edit model card

Idefics3-8B-Llama3-bnb_nf4

BitsAndBytes NF4 quantization.

Quantization

Quantization created with:

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

model_id = "HuggingFaceM4/Idefics3-8B-Llama3"

nf4_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
    llm_int8_enable_fp32_cpu_offload=True,
    llm_int8_skip_modules=["lm_head", "model.vision_model", "model.connector"],
    )

model_nf4 = AutoModelForVision2Seq.from_pretrained(model_id, quantization_config=nf4_config)
Downloads last month
58
Safetensors
Model size
5.08B params
Tensor type
F32
FP16
U8
Inference Examples
Inference API (serverless) does not yet support transformers models for this pipeline type.

Model tree for thwin27/Idefics3-8B-Llama3-bnb_nf4

Quantized
(2)
this model