{ "cells": [ { "attachments": {}, "cell_type": "markdown", "id": "e2b748f3", "metadata": {}, "source": [ "# Sentiment Analysis with OpenVINO™\n", "\n", "**Sentiment analysis** is the use of natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subjective information. This notebook demonstrates how to convert and run a sequence classification model using OpenVINO. \n", "\n", "\n", "#### Table of contents:\n", "\n", "- [Imports](#Imports)\n", "- [Initializing the Model](#Initializing-the-Model)\n", "- [Initializing the Tokenizer](#Initializing-the-Tokenizer)\n", "- [Convert Model to OpenVINO Intermediate Representation format](#Convert-Model-to-OpenVINO-Intermediate-Representation-format)\n", " - [Select inference device](#Select-inference-device)\n", "- [Inference](#Inference)\n", " - [For a single input sentence](#For-a-single-input-sentence)\n", " - [Read from a text file](#Read-from-a-text-file)\n", "\n" ] }, { "attachments": {}, "cell_type": "markdown", "id": "abc41ac0", "metadata": {}, "source": [ "## Imports\n", "[back to top ⬆️](#Table-of-contents:)\n" ] }, { "cell_type": "code", "execution_count": null, "id": "ba2626e0", "metadata": {}, "outputs": [], "source": [ "%pip install \"openvino>=2023.1.0\" transformers \"torch>=2.1\" tqdm --extra-index-url https://download.pytorch.org/whl/cpu" ] }, { "cell_type": "code", "execution_count": 1, "id": "fe80a355", "metadata": { "tags": [] }, "outputs": [], "source": [ "import warnings\n", "from pathlib import Path\n", "import time\n", "from transformers import AutoModelForSequenceClassification, AutoTokenizer\n", "import numpy as np\n", "import openvino as ov" ] }, { "attachments": {}, "cell_type": "markdown", "id": "36add5c2", "metadata": {}, "source": [ "## Initializing the Model\n", "[back to top ⬆️](#Table-of-contents:)\n", "\n", "We will use the transformer-based [DistilBERT base uncased finetuned SST-2](https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english) model from Hugging Face." ] }, { "cell_type": "code", "execution_count": 2, "id": "5db803ea", "metadata": { "tags": [] }, "outputs": [], "source": [ "checkpoint = \"distilbert-base-uncased-finetuned-sst-2-english\"\n", "model = AutoModelForSequenceClassification.from_pretrained(pretrained_model_name_or_path=checkpoint)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "ae70bbf5", "metadata": {}, "source": [ "## Initializing the Tokenizer\n", "[back to top ⬆️](#Table-of-contents:)\n", "\n", "Text Preprocessing cleans the text-based input data so it can be fed into the model. [Tokenization](https://towardsdatascience.com/tokenization-for-natural-language-processing-a179a891bad4) splits paragraphs and sentences into smaller units that can be more easily assigned meaning. It involves cleaning the data and assigning tokens or IDs to the words, so they are represented in a vector space where similar words have similar vectors. This helps the model understand the context of a sentence. Here, we will use [`AutoTokenizer`](https://huggingface.co/docs/transformers/main_classes/tokenizer) - a pre-trained tokenizer from Hugging Face:" ] }, { "cell_type": "code", "execution_count": 3, "id": "782bbebf", "metadata": { "tags": [] }, "outputs": [], "source": [ "tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path=checkpoint)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "4b00e300", "metadata": {}, "source": [ "## Convert Model to OpenVINO Intermediate Representation format\n", "[back to top ⬆️](#Table-of-contents:)\n", "\n", "[Model conversion API](https://docs.openvino.ai/2024/openvino-workflow/model-preparation.html) facilitates the transition between training and deployment environments, performs static model analysis, and adjusts deep learning models for optimal execution on end-point target devices." ] }, { "cell_type": "code", "execution_count": 4, "id": "4794f066", "metadata": { "tags": [] }, "outputs": [], "source": [ "import torch\n", "\n", "ir_xml_name = checkpoint + \".xml\"\n", "MODEL_DIR = \"model/\"\n", "ir_xml_path = Path(MODEL_DIR) / ir_xml_name\n", "\n", "MAX_SEQ_LENGTH = 128\n", "input_info = [\n", " (ov.PartialShape([1, -1]), ov.Type.i64),\n", " (ov.PartialShape([1, -1]), ov.Type.i64),\n", "]\n", "default_input = torch.ones(1, MAX_SEQ_LENGTH, dtype=torch.int64)\n", "inputs = {\n", " \"input_ids\": default_input,\n", " \"attention_mask\": default_input,\n", "}\n", "\n", "ov_model = ov.convert_model(model, input=input_info, example_input=inputs)\n", "ov.save_model(ov_model, ir_xml_path)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "27cc074e", "metadata": {}, "source": [ "OpenVINO™ Runtime uses the [Infer Request](https://docs.openvino.ai/2024/openvino-workflow/running-inference/integrate-openvino-with-your-application/inference-request.html) mechanism which enables running models on different devices in asynchronous or synchronous manners. The model graph is sent as an argument to the OpenVINO API and an inference request is created. The default inference mode is AUTO but it can be changed according to requirements and hardware available. You can explore the different inference modes and their usage [in documentation.](https://docs.openvino.ai/2024/openvino-workflow/running-inference/inference-devices-and-modes.html)" ] }, { "cell_type": "code", "execution_count": 5, "id": "39248a56-11b3-42cc-bf5f-de05e1732c77", "metadata": {}, "outputs": [], "source": [ "core = ov.Core()" ] }, { "attachments": {}, "cell_type": "markdown", "id": "74daf538-ac4d-4fb8-a069-db3af4cf40ea", "metadata": {}, "source": [ "### Select inference device\n", "[back to top ⬆️](#Table-of-contents:)\n", "\n", "select device from dropdown list for running inference using OpenVINO" ] }, { "cell_type": "code", "execution_count": 6, "id": "1e27ef1d-e91e-4cbe-8a86-457ddeb0a1c7", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "3d4bb3500d474fbcb4d52449d22df756", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Dropdown(description='Device:', index=2, options=('CPU', 'GPU', 'AUTO'), value='AUTO')" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import ipywidgets as widgets\n", "\n", "device = widgets.Dropdown(\n", " options=core.available_devices + [\"AUTO\"],\n", " value=\"AUTO\",\n", " description=\"Device:\",\n", " disabled=False,\n", ")\n", "\n", "device" ] }, { "cell_type": "code", "execution_count": 8, "id": "e31a2644", "metadata": { "tags": [] }, "outputs": [], "source": [ "warnings.filterwarnings(\"ignore\")\n", "compiled_model = core.compile_model(ov_model, device.value)\n", "infer_request = compiled_model.create_infer_request()" ] }, { "cell_type": "code", "execution_count": 9, "id": "de01fccc", "metadata": { "tags": [] }, "outputs": [], "source": [ "def softmax(x):\n", " \"\"\"\n", " Defining a softmax function to extract\n", " the prediction from the output of the IR format\n", " Parameters: Logits array\n", " Returns: Probabilities\n", " \"\"\"\n", "\n", " e_x = np.exp(x - np.max(x))\n", " return e_x / e_x.sum()" ] }, { "attachments": {}, "cell_type": "markdown", "id": "2e778507", "metadata": {}, "source": [ "## Inference\n", "[back to top ⬆️](#Table-of-contents:)\n" ] }, { "cell_type": "code", "execution_count": 10, "id": "cc0c91a6", "metadata": { "tags": [] }, "outputs": [], "source": [ "def infer(input_text):\n", " \"\"\"\n", " Creating a generic inference function\n", " to read the input and infer the result\n", " into 2 classes: Positive or Negative.\n", " Parameters: Text to be processed\n", " Returns: Label: Positive or Negative.\n", " \"\"\"\n", "\n", " input_text = tokenizer(\n", " input_text,\n", " truncation=True,\n", " return_tensors=\"np\",\n", " )\n", " inputs = dict(input_text)\n", " label = {0: \"NEGATIVE\", 1: \"POSITIVE\"}\n", " result = infer_request.infer(inputs=inputs)\n", " for i in result.values():\n", " probability = np.argmax(softmax(i))\n", " return label[probability]" ] }, { "attachments": {}, "cell_type": "markdown", "id": "b60e79fd", "metadata": {}, "source": [ "### For a single input sentence\n", "[back to top ⬆️](#Table-of-contents:)\n" ] }, { "cell_type": "code", "execution_count": 11, "id": "cf976f71", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Label: POSITIVE\n", "Total Time: 0.02 seconds\n" ] } ], "source": [ "input_text = \"I had a wonderful day\"\n", "start_time = time.perf_counter()\n", "result = infer(input_text)\n", "end_time = time.perf_counter()\n", "total_time = end_time - start_time\n", "print(\"Label: \", result)\n", "print(\"Total Time: \", \"%.2f\" % total_time, \" seconds\")" ] }, { "attachments": {}, "cell_type": "markdown", "id": "29b4d013", "metadata": {}, "source": [ "### Read from a text file\n", "[back to top ⬆️](#Table-of-contents:)\n" ] }, { "cell_type": "code", "execution_count": null, "id": "5c267032", "metadata": {}, "outputs": [], "source": [ "# Fetch `notebook_utils` module\n", "import requests\n", "\n", "r = requests.get(\n", " url=\"https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/latest/utils/notebook_utils.py\",\n", ")\n", "\n", "open(\"notebook_utils.py\", \"w\").write(r.text)\n", "from notebook_utils import download_file\n", "\n", "# Download the text from the openvino_notebooks storage\n", "vocab_file_path = download_file(\n", " \"https://storage.openvinotoolkit.org/repositories/openvino_notebooks/data/data/text/food_reviews.txt\",\n", " directory=\"data\",\n", ")" ] }, { "cell_type": "code", "execution_count": 12, "id": "63f57d28", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "User Input: The food was horrible.\n", "\n", "Label: NEGATIVE \n", "\n", "User Input: We went because the restaurant had good reviews.\n", "Label: POSITIVE \n", "\n", "Total Time: 0.01 seconds\n" ] } ], "source": [ "start_time = time.perf_counter()\n", "with vocab_file_path.open(mode=\"r\") as f:\n", " input_text = f.readlines()\n", " for lines in input_text:\n", " print(\"User Input: \", lines)\n", " result = infer(lines)\n", " print(\"Label: \", result, \"\\n\")\n", "end_time = time.perf_counter()\n", "total_time = end_time - start_time\n", "print(\"Total Time: \", \"%.2f\" % total_time, \" seconds\")" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" }, "nbTranslate": { "displayLangs": [ "*" ], "hotkey": "alt-t", "langInMainMenu": true, "sourceLang": "en", "targetLang": "fr", "useGoogleTranslate": true }, "openvino_notebooks": { "imageUrl": "https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/distilbert-sequence-classification/distilbert-sequence-classification.png?raw=true", "tags": { "categories": [ "Model Demos" ], "libraries": [], "other": [], "tasks": [ "Text Classification" ] } }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false }, "widgets": { "application/vnd.jupyter.widget-state+json": { "state": {}, "version_major": 2, "version_minor": 0 } } }, "nbformat": 4, "nbformat_minor": 5 }