amd
/

Llama-2-7b-hf-awq-g128-int4-asym-fp16-onnx-hybrid

Text Generation

Model card Files Files and versions Community

Llama-2-7b-hf-awq-g128-int4-asym-fp16-onnx-hybrid / README.md

uday610's picture

Update README.md

4a10692 verified 17 days ago

|

1.03 kB

metadata

license: llama2
language:
  - en
pipeline_tag: text-generation
tags:
  - llama
  - llama2
  - amd
  - meta
  - facebook
  - onnx
base_model:
  - meta-llama/Llama-2-7b-hf

meta-llama/Llama-2-7b-hf

Introduction
- Quantization Tool: Quark 0.6.0
- OGA Model Builder: v0.5.1
- Postprocess

Quantization Strategy

AWQ / Group 128 / Asymmetric / UINT4 Weights / FP16 activations
Excluded Layers: None

python3 quantize_quark.py \
      --model_dir "$model" \
      --output_dir "$output_dir" \
      --quant_scheme w_uint4_per_group_asym \
      --num_calib_data 128 \
      --quant_algo awq \
      --dataset pileval_for_awq_benchmark \
      --seq_len 512 \
      --model_export quark_safetensors \
      --data_type float16 \
      --exclude_layers [] \
      --custom_mode awq

OGA Model Builder

python builder.py \
  -i <quantized safetensor model dir> \
  -o <oga model output dir> \
  -p int4 \
  -e dml

PostProcessed to generate Hybrid Model