{"metadata":{"kernelspec":{"language":"python","display_name":"Python 3","name":"python3"},"language_info":{"name":"python","version":"3.7.9","mimetype":"text/x-python","codemirror_mode":{"name":"ipython","version":3},"pygments_lexer":"ipython3","nbconvert_exporter":"python","file_extension":".py"}},"nbformat_minor":4,"nbformat":4,"cells":[{"cell_type":"markdown","source":"## Decision Trees tutorial & improving hosting with skops 🌲\n\nIn this notebook I will walk you through decision trees and how to inspect them, and we will later improve model hosting using [skops](https://skops.readthedocs.io/en/stable/). ","metadata":{}},{"cell_type":"code","source":"# This Python 3 environment comes with many helpful analytics libraries installed\n# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python\n# For example, here's several helpful packages to load\n\nimport numpy as np # linear algebra\nimport pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)\n\n# Input data files are available in the read-only \"../input/\" directory\n# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory\n\nimport os\nfor dirname, _, filenames in os.walk('/kaggle/input'):\n for filename in filenames:\n print(os.path.join(dirname, filename))\n\n# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using \"Save & Run All\" \n# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session","metadata":{"_uuid":"8f2839f25d086af736a60e9eeb907d3b93b6e0e5","_cell_guid":"b1076dfc-b9ad-4769-8c92-a6c4dae69d19","_kg_hide-input":true,"_kg_hide-output":true,"execution":{"iopub.status.busy":"2022-12-01T13:21:07.411748Z","iopub.execute_input":"2022-12-01T13:21:07.412350Z","iopub.status.idle":"2022-12-01T13:21:07.419860Z","shell.execute_reply.started":"2022-12-01T13:21:07.412261Z","shell.execute_reply":"2022-12-01T13:21:07.418325Z"},"trusted":true},"execution_count":1,"outputs":[]},{"cell_type":"code","source":"!pip install skops","metadata":{"_kg_hide-output":true,"execution":{"iopub.status.busy":"2022-12-01T13:21:09.851317Z","iopub.execute_input":"2022-12-01T13:21:09.851890Z","iopub.status.idle":"2022-12-01T13:21:15.803438Z","shell.execute_reply.started":"2022-12-01T13:21:09.851859Z","shell.execute_reply":"2022-12-01T13:21:15.802081Z"},"trusted":true},"execution_count":2,"outputs":[{"name":"stdout","text":"Requirement already satisfied: skops in /opt/conda/lib/python3.7/site-packages (0.3.0)\nRequirement already satisfied: tabulate>=0.8.8 in /opt/conda/lib/python3.7/site-packages (from skops) (0.8.8)\nRequirement already satisfied: typing-extensions>=3.7 in /opt/conda/lib/python3.7/site-packages (from skops) (3.7.4.3)\nRequirement already satisfied: huggingface-hub>=0.10.1 in /opt/conda/lib/python3.7/site-packages (from skops) (0.11.1)\nRequirement already satisfied: scikit-learn>=0.24 in /opt/conda/lib/python3.7/site-packages (from skops) (0.24.1)\nRequirement already satisfied: packaging>=20.9 in /opt/conda/lib/python3.7/site-packages (from huggingface-hub>=0.10.1->skops) (21.3)\nRequirement already satisfied: tqdm in /opt/conda/lib/python3.7/site-packages (from huggingface-hub>=0.10.1->skops) (4.55.1)\nRequirement already satisfied: pyyaml>=5.1 in /opt/conda/lib/python3.7/site-packages (from huggingface-hub>=0.10.1->skops) (5.3.1)\nRequirement already satisfied: requests in /opt/conda/lib/python3.7/site-packages (from huggingface-hub>=0.10.1->skops) (2.25.1)\nRequirement already satisfied: importlib-metadata in /opt/conda/lib/python3.7/site-packages (from huggingface-hub>=0.10.1->skops) (3.3.0)\nRequirement already satisfied: filelock in /opt/conda/lib/python3.7/site-packages (from huggingface-hub>=0.10.1->skops) (3.0.12)\nRequirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /opt/conda/lib/python3.7/site-packages (from packaging>=20.9->huggingface-hub>=0.10.1->skops) (2.4.7)\nRequirement already satisfied: scipy>=0.19.1 in /opt/conda/lib/python3.7/site-packages (from scikit-learn>=0.24->skops) (1.5.4)\nRequirement already satisfied: threadpoolctl>=2.0.0 in /opt/conda/lib/python3.7/site-packages (from scikit-learn>=0.24->skops) (2.1.0)\nRequirement already satisfied: joblib>=0.11 in /opt/conda/lib/python3.7/site-packages (from scikit-learn>=0.24->skops) (1.0.0)\nRequirement already satisfied: numpy>=1.13.3 in /opt/conda/lib/python3.7/site-packages (from scikit-learn>=0.24->skops) (1.19.5)\nRequirement already satisfied: zipp>=0.5 in /opt/conda/lib/python3.7/site-packages (from importlib-metadata->huggingface-hub>=0.10.1->skops) (3.4.0)\nRequirement already satisfied: urllib3<1.27,>=1.21.1 in /opt/conda/lib/python3.7/site-packages (from requests->huggingface-hub>=0.10.1->skops) (1.26.2)\nRequirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.7/site-packages (from requests->huggingface-hub>=0.10.1->skops) (2020.12.5)\nRequirement already satisfied: idna<3,>=2.5 in /opt/conda/lib/python3.7/site-packages (from requests->huggingface-hub>=0.10.1->skops) (2.10)\nRequirement already satisfied: chardet<5,>=3.0.2 in /opt/conda/lib/python3.7/site-packages (from requests->huggingface-hub>=0.10.1->skops) (3.0.4)\n\u001b[33mWARNING: You are using pip version 21.0.1; however, version 22.3.1 is available.\nYou should consider upgrading via the '/opt/conda/bin/python3.7 -m pip install --upgrade pip' command.\u001b[0m\n","output_type":"stream"}]},{"cell_type":"markdown","source":"We will use breast cancer dataset from sklearn datasets. We will load the dataset and split. ","metadata":{}},{"cell_type":"code","source":"from sklearn.datasets import load_breast_cancer\nfrom sklearn.model_selection import train_test_split","metadata":{"execution":{"iopub.status.busy":"2022-12-01T13:21:16.196274Z","iopub.execute_input":"2022-12-01T13:21:16.196592Z","iopub.status.idle":"2022-12-01T13:21:16.523656Z","shell.execute_reply.started":"2022-12-01T13:21:16.196567Z","shell.execute_reply":"2022-12-01T13:21:16.522085Z"},"trusted":true},"execution_count":3,"outputs":[]},{"cell_type":"code","source":"cancer = load_breast_cancer()\ndata = pd.DataFrame(cancer.data, columns=[cancer.feature_names])\ndata.head()","metadata":{"execution":{"iopub.status.busy":"2022-12-01T13:21:16.668054Z","iopub.execute_input":"2022-12-01T13:21:16.668383Z","iopub.status.idle":"2022-12-01T13:21:16.719596Z","shell.execute_reply.started":"2022-12-01T13:21:16.668356Z","shell.execute_reply":"2022-12-01T13:21:16.717624Z"},"trusted":true},"execution_count":4,"outputs":[{"execution_count":4,"output_type":"execute_result","data":{"text/plain":" mean radius mean texture mean perimeter mean area mean smoothness \\\n0 17.99 10.38 122.80 1001.0 0.11840 \n1 20.57 17.77 132.90 1326.0 0.08474 \n2 19.69 21.25 130.00 1203.0 0.10960 \n3 11.42 20.38 77.58 386.1 0.14250 \n4 20.29 14.34 135.10 1297.0 0.10030 \n\n mean compactness mean concavity mean concave points mean symmetry \\\n0 0.27760 0.3001 0.14710 0.2419 \n1 0.07864 0.0869 0.07017 0.1812 \n2 0.15990 0.1974 0.12790 0.2069 \n3 0.28390 0.2414 0.10520 0.2597 \n4 0.13280 0.1980 0.10430 0.1809 \n\n mean fractal dimension ... worst radius worst texture worst perimeter \\\n0 0.07871 ... 25.38 17.33 184.60 \n1 0.05667 ... 24.99 23.41 158.80 \n2 0.05999 ... 23.57 25.53 152.50 \n3 0.09744 ... 14.91 26.50 98.87 \n4 0.05883 ... 22.54 16.67 152.20 \n\n worst area worst smoothness worst compactness worst concavity \\\n0 2019.0 0.1622 0.6656 0.7119 \n1 1956.0 0.1238 0.1866 0.2416 \n2 1709.0 0.1444 0.4245 0.4504 \n3 567.7 0.2098 0.8663 0.6869 \n4 1575.0 0.1374 0.2050 0.4000 \n\n worst concave points worst symmetry worst fractal dimension \n0 0.2654 0.4601 0.11890 \n1 0.1860 0.2750 0.08902 \n2 0.2430 0.3613 0.08758 \n3 0.2575 0.6638 0.17300 \n4 0.1625 0.2364 0.07678 \n\n[5 rows x 30 columns]","text/html":"
\n | mean radius | \nmean texture | \nmean perimeter | \nmean area | \nmean smoothness | \nmean compactness | \nmean concavity | \nmean concave points | \nmean symmetry | \nmean fractal dimension | \n... | \nworst radius | \nworst texture | \nworst perimeter | \nworst area | \nworst smoothness | \nworst compactness | \nworst concavity | \nworst concave points | \nworst symmetry | \nworst fractal dimension | \n
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n17.99 | \n10.38 | \n122.80 | \n1001.0 | \n0.11840 | \n0.27760 | \n0.3001 | \n0.14710 | \n0.2419 | \n0.07871 | \n... | \n25.38 | \n17.33 | \n184.60 | \n2019.0 | \n0.1622 | \n0.6656 | \n0.7119 | \n0.2654 | \n0.4601 | \n0.11890 | \n
1 | \n20.57 | \n17.77 | \n132.90 | \n1326.0 | \n0.08474 | \n0.07864 | \n0.0869 | \n0.07017 | \n0.1812 | \n0.05667 | \n... | \n24.99 | \n23.41 | \n158.80 | \n1956.0 | \n0.1238 | \n0.1866 | \n0.2416 | \n0.1860 | \n0.2750 | \n0.08902 | \n
2 | \n19.69 | \n21.25 | \n130.00 | \n1203.0 | \n0.10960 | \n0.15990 | \n0.1974 | \n0.12790 | \n0.2069 | \n0.05999 | \n... | \n23.57 | \n25.53 | \n152.50 | \n1709.0 | \n0.1444 | \n0.4245 | \n0.4504 | \n0.2430 | \n0.3613 | \n0.08758 | \n
3 | \n11.42 | \n20.38 | \n77.58 | \n386.1 | \n0.14250 | \n0.28390 | \n0.2414 | \n0.10520 | \n0.2597 | \n0.09744 | \n... | \n14.91 | \n26.50 | \n98.87 | \n567.7 | \n0.2098 | \n0.8663 | \n0.6869 | \n0.2575 | \n0.6638 | \n0.17300 | \n
4 | \n20.29 | \n14.34 | \n135.10 | \n1297.0 | \n0.10030 | \n0.13280 | \n0.1980 | \n0.10430 | \n0.1809 | \n0.05883 | \n... | \n22.54 | \n16.67 | \n152.20 | \n1575.0 | \n0.1374 | \n0.2050 | \n0.4000 | \n0.1625 | \n0.2364 | \n0.07678 | \n
5 rows × 30 columns
\n