uday610's picture
Update README.md
4a10692 verified
|
raw
history blame
1.03 kB
metadata
license: llama2
language:
  - en
pipeline_tag: text-generation
tags:
  - llama
  - llama2
  - amd
  - meta
  - facebook
  - onnx
base_model:
  - meta-llama/Llama-2-7b-hf

meta-llama/Llama-2-7b-hf

  • Introduction

    • Quantization Tool: Quark 0.6.0
    • OGA Model Builder: v0.5.1
    • Postprocess
  • Quantization Strategy

    • AWQ / Group 128 / Asymmetric / UINT4 Weights / FP16 activations
    • Excluded Layers: None
    python3 quantize_quark.py \
          --model_dir "$model" \
          --output_dir "$output_dir" \
          --quant_scheme w_uint4_per_group_asym \
          --num_calib_data 128 \
          --quant_algo awq \
          --dataset pileval_for_awq_benchmark \
          --seq_len 512 \
          --model_export quark_safetensors \
          --data_type float16 \
          --exclude_layers [] \
          --custom_mode awq
    
  • OGA Model Builder

    python builder.py \
      -i <quantized safetensor model dir> \
      -o <oga model output dir> \
      -p int4 \
      -e dml
    
  • PostProcessed to generate Hybrid Model