See the benchmark scripts in this repo.
pip install deepsparse-nightly[llm]==1.6.0.20231120
pip install openvino==2023.3.0
Benchmarking
- Clone this repo
- Concatenate the big fp32 IR model:
cd ./models/neuralmagic/mpt-7b-gsm8k-pt/fp32
cat openvino_model.bin.part-a* > openvino_model.bin
- Reproduce NM paper:
deepsparse_reproduce.bash
- OV benchmarkapp:
benchmarkapp_*.bash
Generating these IRs
https://github.com/yujiepan-work/24h1-sparse-quantized-llm-ov
Inference API (serverless) does not yet support OpenVINO models for this pipeline type.