vuiseng9
/

ov-mpt-7b-gsm8k-sparse70

Text Generation

Model card Files Files and versions Community

ov-mpt-7b-gsm8k-sparse70 / README.md

vuiseng9's picture

Update README.md

02f8d89 verified 5 months ago

|

history blame contribute delete

1.94 kB

	---
	pipeline_tag: text-generation
	tags:
	- openvino
	- mpt
	- sparse
	- quantization
	library_name: "OpenVINO"
	---

	The intent of this repo is to compare the performance delta between dense quantized MPT-7B and 70% sparse-quantized MPT-7B on OpenVINO. Quantization here is 8-bit on both weight and activation. Benchmark metric is decoding (next token) latency with context length 512.

	Target HW: Intel 4th gen Xeon (Sapphire Rapids)

	SW
	```
	git clone https://huggingface.co/vuiseng9/ov-mpt-7b-gsm8k-sparse70
	pip install openvino==2024.2.0
	```

	## Benchmarking with OpenVINO

	1. ./benchmarkapp_w8a8.bash
	2. ./benchmarkapp_w8a8_sparse70.bash

	Note: do remove the numactl if your node does not support it.


	## Implementation of Sparse Weight Decompression in OpenVINO
	* This is the first commit of Sparse Weight Decompression on OpenVINO’s fork of oneDNN.
	https://github.com/openvinotoolkit/oneDNN/pull/158/files

	* you can browse this via the left pane.

	* initialization: src/cpu/reorder/simple_sparse_reorder.hpp ([line 113](https://github.com/openvinotoolkit/oneDNN/pull/158/files#diff-f1445f832cd9979d9756873e3d8c30716976f51b6ce4640eae12762a417284e3R113))

	* decompression: src/cpu/x64/jit_brgemm_decompress_kernel.cpp ([line 41](https://github.com/openvinotoolkit/oneDNN/pull/158/files#diff-98844e424b6687de78d47737e62f206dc9befcec6887dac8b2c52d0303dd3576R41))


	* If you'd like to build OpenVINO runtime from source for debug, [see wiki page](https://github.com/openvinotoolkit/openvino/blob/master/docs/dev/build.md). Benchmark_app is compiled as well.

	## Related materials:
	[OpenVINO blog on Sparse-Quantized BERT](https://blog.openvino.ai/blog-posts/accelerate-inference-of-sparse-transformer-models-with-openvino-tm-and-4th-gen-intel-r-xeon-r-scalable-processors) ([corresponding notebook](https://github.com/openvinotoolkit/openvino_notebooks/blob/main/notebooks/116-sparsity-optimization/116-sparsity-optimization.ipynb))