metadata
license: llama2
language:
- en
pipeline_tag: text-generation
tags:
- llama
- llama2
- amd
- meta
- facebook
- onnx
base_model:
- meta-llama/Llama-2-7b-hf
meta-llama/Llama-2-7b-hf
Introduction
- Quantization Tool: Quark 0.6.0
- OGA Model Builder: v0.5.1
- Postprocess
Quantization Strategy
- AWQ / Group 128 / Asymmetric / UINT4 Weights / FP16 activations
- Excluded Layers: None
python3 quantize_quark.py \ --model_dir "$model" \ --output_dir "$output_dir" \ --quant_scheme w_uint4_per_group_asym \ --num_calib_data 128 \ --quant_algo awq \ --dataset pileval_for_awq_benchmark \ --seq_len 512 \ --model_export quark_safetensors \ --data_type float16 \ --exclude_layers [] \ --custom_mode awq
OGA Model Builder
python builder.py \ -i <quantized safetensor model dir> \ -o <oga model output dir> \ -p int4 \ -e dml
- PostProcessed to generate Hybrid Model