Spaces:
Runtime error
Runtime error
Accelerate Inference of NLP models with Post-Training Quantization API of NNCF
This tutorial demonstrates how to apply INT8 quantization to the Natural Language Processing model BERT, using the Post-Training Quantization API. The HuggingFace BERT PyTorch model, fine-tuned for Microsoft Research Paraphrase Corpus (MRPC) task is used. The code of this tutorial is designed to be extendable to custom models and datasets.
Notebook Contents
The tutorial consists of the following steps:
- Downloading and preparing the MRPC model and a dataset.
- Defining data loading functionality.
- Running optimization pipeline.
- Comparing F1 score of the original and quantized models.
- Comparing performance of the original and quantized models.
Installation Instructions
This is a self-contained example that relies solely on its own code.
We recommend running the notebook in a virtual environment. You only need a Jupyter server to start.
For details, please refer to Installation Guide.