amd
/

Llama-2-7b-hf-awq-g128-int4-asym-fp16-onnx-hybrid

Text Generation

Model card Files Files and versions Community

Llama-2-7b-hf-awq-g128-int4-asym-fp16-onnx-hybrid / README.md

uday610's picture

Update README.md

4a10692 verified 27 days ago

|

1.03 kB

	---
	license: llama2
	language:
	- en
	pipeline_tag: text-generation
	tags:
	- llama
	- llama2
	- amd
	- meta
	- facebook
	- onnx
	base_model:
	- meta-llama/Llama-2-7b-hf
	---

	# meta-llama/Llama-2-7b-hf
	- ## Introduction
	- Quantization Tool: Quark 0.6.0
	- OGA Model Builder: v0.5.1
	- Postprocess
	- ## Quantization Strategy
	- AWQ / Group 128 / Asymmetric / UINT4 Weights / FP16 activations
	- Excluded Layers: None
	```
	python3 quantize_quark.py \
	--model_dir "$model" \
	--output_dir "$output_dir" \
	--quant_scheme w_uint4_per_group_asym \
	--num_calib_data 128 \
	--quant_algo awq \
	--dataset pileval_for_awq_benchmark \
	--seq_len 512 \
	--model_export quark_safetensors \
	--data_type float16 \
	--exclude_layers [] \
	--custom_mode awq
	```
	- ## OGA Model Builder
	```
	python builder.py \
	-i <quantized safetensor model dir> \
	-o <oga model output dir> \
	-p int4 \
	-e dml
	```
	- PostProcessed to generate Hybrid Model
	-