hassaanik commited on
Commit
d6c8342
1 Parent(s): 9ef73ed

Upload 2 files

Browse files
Notebooks/Diabetes Classification.ipynb ADDED
The diff for this file is too large to render. See raw diff
 
Notebooks/Medicine Classification.ipynb ADDED
@@ -0,0 +1,1462 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "nbformat": 4,
3
+ "nbformat_minor": 0,
4
+ "metadata": {
5
+ "colab": {
6
+ "provenance": [],
7
+ "collapsed_sections": [
8
+ "nMPIMsBXnowt",
9
+ "zWz1-JCKnudh",
10
+ "gFUsQMWP87EE",
11
+ "6TbYU2UKn0DJ",
12
+ "qyMS8mQnn2Dx"
13
+ ]
14
+ },
15
+ "kernelspec": {
16
+ "name": "python3",
17
+ "display_name": "Python 3"
18
+ },
19
+ "language_info": {
20
+ "name": "python"
21
+ }
22
+ },
23
+ "cells": [
24
+ {
25
+ "cell_type": "markdown",
26
+ "source": [
27
+ "### Data Preparation"
28
+ ],
29
+ "metadata": {
30
+ "id": "nMPIMsBXnowt"
31
+ }
32
+ },
33
+ {
34
+ "cell_type": "code",
35
+ "execution_count": null,
36
+ "metadata": {
37
+ "colab": {
38
+ "base_uri": "https://localhost:8080/"
39
+ },
40
+ "id": "nrcgcY0HWd3u",
41
+ "outputId": "998fc695-b11e-4648-ac5f-d2b73f88e306"
42
+ },
43
+ "outputs": [
44
+ {
45
+ "output_type": "stream",
46
+ "name": "stdout",
47
+ "text": [
48
+ "Collecting opendatasets\n",
49
+ " Downloading opendatasets-0.1.22-py3-none-any.whl.metadata (9.2 kB)\n",
50
+ "Requirement already satisfied: tqdm in /usr/local/lib/python3.10/dist-packages (from opendatasets) (4.66.5)\n",
51
+ "Requirement already satisfied: kaggle in /usr/local/lib/python3.10/dist-packages (from opendatasets) (1.6.17)\n",
52
+ "Requirement already satisfied: click in /usr/local/lib/python3.10/dist-packages (from opendatasets) (8.1.7)\n",
53
+ "Requirement already satisfied: six>=1.10 in /usr/local/lib/python3.10/dist-packages (from kaggle->opendatasets) (1.16.0)\n",
54
+ "Requirement already satisfied: certifi>=2023.7.22 in /usr/local/lib/python3.10/dist-packages (from kaggle->opendatasets) (2024.7.4)\n",
55
+ "Requirement already satisfied: python-dateutil in /usr/local/lib/python3.10/dist-packages (from kaggle->opendatasets) (2.8.2)\n",
56
+ "Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from kaggle->opendatasets) (2.32.3)\n",
57
+ "Requirement already satisfied: python-slugify in /usr/local/lib/python3.10/dist-packages (from kaggle->opendatasets) (8.0.4)\n",
58
+ "Requirement already satisfied: urllib3 in /usr/local/lib/python3.10/dist-packages (from kaggle->opendatasets) (2.0.7)\n",
59
+ "Requirement already satisfied: bleach in /usr/local/lib/python3.10/dist-packages (from kaggle->opendatasets) (6.1.0)\n",
60
+ "Requirement already satisfied: webencodings in /usr/local/lib/python3.10/dist-packages (from bleach->kaggle->opendatasets) (0.5.1)\n",
61
+ "Requirement already satisfied: text-unidecode>=1.3 in /usr/local/lib/python3.10/dist-packages (from python-slugify->kaggle->opendatasets) (1.3)\n",
62
+ "Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests->kaggle->opendatasets) (3.3.2)\n",
63
+ "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests->kaggle->opendatasets) (3.7)\n",
64
+ "Downloading opendatasets-0.1.22-py3-none-any.whl (15 kB)\n",
65
+ "Installing collected packages: opendatasets\n",
66
+ "Successfully installed opendatasets-0.1.22\n"
67
+ ]
68
+ }
69
+ ],
70
+ "source": [
71
+ "!pip install opendatasets"
72
+ ]
73
+ },
74
+ {
75
+ "cell_type": "code",
76
+ "source": [
77
+ "import opendatasets as od\n",
78
+ "od.download(\"https://www.kaggle.com/datasets/prasad22/healthcare-dataset\")"
79
+ ],
80
+ "metadata": {
81
+ "colab": {
82
+ "base_uri": "https://localhost:8080/"
83
+ },
84
+ "id": "I6P_5cbGWnYv",
85
+ "outputId": "fbdbc44f-1ab0-49be-a17c-7bfe0aa77c12"
86
+ },
87
+ "execution_count": null,
88
+ "outputs": [
89
+ {
90
+ "output_type": "stream",
91
+ "name": "stdout",
92
+ "text": [
93
+ "Dataset URL: https://www.kaggle.com/datasets/prasad22/healthcare-dataset\n",
94
+ "Downloading healthcare-dataset.zip to ./healthcare-dataset\n"
95
+ ]
96
+ },
97
+ {
98
+ "output_type": "stream",
99
+ "name": "stderr",
100
+ "text": [
101
+ "100%|██████████| 2.91M/2.91M [00:00<00:00, 46.8MB/s]"
102
+ ]
103
+ },
104
+ {
105
+ "output_type": "stream",
106
+ "name": "stdout",
107
+ "text": [
108
+ "\n"
109
+ ]
110
+ },
111
+ {
112
+ "output_type": "stream",
113
+ "name": "stderr",
114
+ "text": [
115
+ "\n"
116
+ ]
117
+ }
118
+ ]
119
+ },
120
+ {
121
+ "cell_type": "code",
122
+ "source": [
123
+ "import pandas as pd\n",
124
+ "df = pd.read_csv(\"/content/healthcare_dataset.csv\")\n",
125
+ "df = df[['Age','Gender','Blood Type','Medical Condition','Test Results','Medication']]"
126
+ ],
127
+ "metadata": {
128
+ "colab": {
129
+ "base_uri": "https://localhost:8080/",
130
+ "height": 423
131
+ },
132
+ "collapsed": true,
133
+ "id": "q-CvLDIMWrs5",
134
+ "outputId": "303ca4ec-f55a-4ffc-9466-c7b241521a4c"
135
+ },
136
+ "execution_count": null,
137
+ "outputs": [
138
+ {
139
+ "output_type": "execute_result",
140
+ "data": {
141
+ "text/plain": [
142
+ " Age Gender Blood Type Medical Condition Test Results Medication\n",
143
+ "0 30 Male B- Cancer Normal Paracetamol\n",
144
+ "1 62 Male A+ Obesity Inconclusive Ibuprofen\n",
145
+ "2 76 Female A- Obesity Normal Aspirin\n",
146
+ "3 28 Female O+ Diabetes Abnormal Ibuprofen\n",
147
+ "4 43 Female AB+ Cancer Abnormal Penicillin\n",
148
+ "... ... ... ... ... ... ...\n",
149
+ "55495 42 Female O+ Asthma Abnormal Penicillin\n",
150
+ "55496 61 Female AB- Obesity Normal Aspirin\n",
151
+ "55497 38 Female B+ Hypertension Abnormal Ibuprofen\n",
152
+ "55498 43 Male O- Arthritis Abnormal Ibuprofen\n",
153
+ "55499 53 Female O+ Arthritis Abnormal Ibuprofen\n",
154
+ "\n",
155
+ "[55500 rows x 6 columns]"
156
+ ],
157
+ "text/html": [
158
+ "\n",
159
+ " <div id=\"df-372322f2-72d2-45a5-903b-c7d201ee51c9\" class=\"colab-df-container\">\n",
160
+ " <div>\n",
161
+ "<style scoped>\n",
162
+ " .dataframe tbody tr th:only-of-type {\n",
163
+ " vertical-align: middle;\n",
164
+ " }\n",
165
+ "\n",
166
+ " .dataframe tbody tr th {\n",
167
+ " vertical-align: top;\n",
168
+ " }\n",
169
+ "\n",
170
+ " .dataframe thead th {\n",
171
+ " text-align: right;\n",
172
+ " }\n",
173
+ "</style>\n",
174
+ "<table border=\"1\" class=\"dataframe\">\n",
175
+ " <thead>\n",
176
+ " <tr style=\"text-align: right;\">\n",
177
+ " <th></th>\n",
178
+ " <th>Age</th>\n",
179
+ " <th>Gender</th>\n",
180
+ " <th>Blood Type</th>\n",
181
+ " <th>Medical Condition</th>\n",
182
+ " <th>Test Results</th>\n",
183
+ " <th>Medication</th>\n",
184
+ " </tr>\n",
185
+ " </thead>\n",
186
+ " <tbody>\n",
187
+ " <tr>\n",
188
+ " <th>0</th>\n",
189
+ " <td>30</td>\n",
190
+ " <td>Male</td>\n",
191
+ " <td>B-</td>\n",
192
+ " <td>Cancer</td>\n",
193
+ " <td>Normal</td>\n",
194
+ " <td>Paracetamol</td>\n",
195
+ " </tr>\n",
196
+ " <tr>\n",
197
+ " <th>1</th>\n",
198
+ " <td>62</td>\n",
199
+ " <td>Male</td>\n",
200
+ " <td>A+</td>\n",
201
+ " <td>Obesity</td>\n",
202
+ " <td>Inconclusive</td>\n",
203
+ " <td>Ibuprofen</td>\n",
204
+ " </tr>\n",
205
+ " <tr>\n",
206
+ " <th>2</th>\n",
207
+ " <td>76</td>\n",
208
+ " <td>Female</td>\n",
209
+ " <td>A-</td>\n",
210
+ " <td>Obesity</td>\n",
211
+ " <td>Normal</td>\n",
212
+ " <td>Aspirin</td>\n",
213
+ " </tr>\n",
214
+ " <tr>\n",
215
+ " <th>3</th>\n",
216
+ " <td>28</td>\n",
217
+ " <td>Female</td>\n",
218
+ " <td>O+</td>\n",
219
+ " <td>Diabetes</td>\n",
220
+ " <td>Abnormal</td>\n",
221
+ " <td>Ibuprofen</td>\n",
222
+ " </tr>\n",
223
+ " <tr>\n",
224
+ " <th>4</th>\n",
225
+ " <td>43</td>\n",
226
+ " <td>Female</td>\n",
227
+ " <td>AB+</td>\n",
228
+ " <td>Cancer</td>\n",
229
+ " <td>Abnormal</td>\n",
230
+ " <td>Penicillin</td>\n",
231
+ " </tr>\n",
232
+ " <tr>\n",
233
+ " <th>...</th>\n",
234
+ " <td>...</td>\n",
235
+ " <td>...</td>\n",
236
+ " <td>...</td>\n",
237
+ " <td>...</td>\n",
238
+ " <td>...</td>\n",
239
+ " <td>...</td>\n",
240
+ " </tr>\n",
241
+ " <tr>\n",
242
+ " <th>55495</th>\n",
243
+ " <td>42</td>\n",
244
+ " <td>Female</td>\n",
245
+ " <td>O+</td>\n",
246
+ " <td>Asthma</td>\n",
247
+ " <td>Abnormal</td>\n",
248
+ " <td>Penicillin</td>\n",
249
+ " </tr>\n",
250
+ " <tr>\n",
251
+ " <th>55496</th>\n",
252
+ " <td>61</td>\n",
253
+ " <td>Female</td>\n",
254
+ " <td>AB-</td>\n",
255
+ " <td>Obesity</td>\n",
256
+ " <td>Normal</td>\n",
257
+ " <td>Aspirin</td>\n",
258
+ " </tr>\n",
259
+ " <tr>\n",
260
+ " <th>55497</th>\n",
261
+ " <td>38</td>\n",
262
+ " <td>Female</td>\n",
263
+ " <td>B+</td>\n",
264
+ " <td>Hypertension</td>\n",
265
+ " <td>Abnormal</td>\n",
266
+ " <td>Ibuprofen</td>\n",
267
+ " </tr>\n",
268
+ " <tr>\n",
269
+ " <th>55498</th>\n",
270
+ " <td>43</td>\n",
271
+ " <td>Male</td>\n",
272
+ " <td>O-</td>\n",
273
+ " <td>Arthritis</td>\n",
274
+ " <td>Abnormal</td>\n",
275
+ " <td>Ibuprofen</td>\n",
276
+ " </tr>\n",
277
+ " <tr>\n",
278
+ " <th>55499</th>\n",
279
+ " <td>53</td>\n",
280
+ " <td>Female</td>\n",
281
+ " <td>O+</td>\n",
282
+ " <td>Arthritis</td>\n",
283
+ " <td>Abnormal</td>\n",
284
+ " <td>Ibuprofen</td>\n",
285
+ " </tr>\n",
286
+ " </tbody>\n",
287
+ "</table>\n",
288
+ "<p>55500 rows × 6 columns</p>\n",
289
+ "</div>\n",
290
+ " <div class=\"colab-df-buttons\">\n",
291
+ "\n",
292
+ " <div class=\"colab-df-container\">\n",
293
+ " <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-372322f2-72d2-45a5-903b-c7d201ee51c9')\"\n",
294
+ " title=\"Convert this dataframe to an interactive table.\"\n",
295
+ " style=\"display:none;\">\n",
296
+ "\n",
297
+ " <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
298
+ " <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
299
+ " </svg>\n",
300
+ " </button>\n",
301
+ "\n",
302
+ " <style>\n",
303
+ " .colab-df-container {\n",
304
+ " display:flex;\n",
305
+ " gap: 12px;\n",
306
+ " }\n",
307
+ "\n",
308
+ " .colab-df-convert {\n",
309
+ " background-color: #E8F0FE;\n",
310
+ " border: none;\n",
311
+ " border-radius: 50%;\n",
312
+ " cursor: pointer;\n",
313
+ " display: none;\n",
314
+ " fill: #1967D2;\n",
315
+ " height: 32px;\n",
316
+ " padding: 0 0 0 0;\n",
317
+ " width: 32px;\n",
318
+ " }\n",
319
+ "\n",
320
+ " .colab-df-convert:hover {\n",
321
+ " background-color: #E2EBFA;\n",
322
+ " box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
323
+ " fill: #174EA6;\n",
324
+ " }\n",
325
+ "\n",
326
+ " .colab-df-buttons div {\n",
327
+ " margin-bottom: 4px;\n",
328
+ " }\n",
329
+ "\n",
330
+ " [theme=dark] .colab-df-convert {\n",
331
+ " background-color: #3B4455;\n",
332
+ " fill: #D2E3FC;\n",
333
+ " }\n",
334
+ "\n",
335
+ " [theme=dark] .colab-df-convert:hover {\n",
336
+ " background-color: #434B5C;\n",
337
+ " box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
338
+ " filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
339
+ " fill: #FFFFFF;\n",
340
+ " }\n",
341
+ " </style>\n",
342
+ "\n",
343
+ " <script>\n",
344
+ " const buttonEl =\n",
345
+ " document.querySelector('#df-372322f2-72d2-45a5-903b-c7d201ee51c9 button.colab-df-convert');\n",
346
+ " buttonEl.style.display =\n",
347
+ " google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
348
+ "\n",
349
+ " async function convertToInteractive(key) {\n",
350
+ " const element = document.querySelector('#df-372322f2-72d2-45a5-903b-c7d201ee51c9');\n",
351
+ " const dataTable =\n",
352
+ " await google.colab.kernel.invokeFunction('convertToInteractive',\n",
353
+ " [key], {});\n",
354
+ " if (!dataTable) return;\n",
355
+ "\n",
356
+ " const docLinkHtml = 'Like what you see? Visit the ' +\n",
357
+ " '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
358
+ " + ' to learn more about interactive tables.';\n",
359
+ " element.innerHTML = '';\n",
360
+ " dataTable['output_type'] = 'display_data';\n",
361
+ " await google.colab.output.renderOutput(dataTable, element);\n",
362
+ " const docLink = document.createElement('div');\n",
363
+ " docLink.innerHTML = docLinkHtml;\n",
364
+ " element.appendChild(docLink);\n",
365
+ " }\n",
366
+ " </script>\n",
367
+ " </div>\n",
368
+ "\n",
369
+ "\n",
370
+ "<div id=\"df-c3874bd4-bc8e-4fd2-8f65-372c533ad3b7\">\n",
371
+ " <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-c3874bd4-bc8e-4fd2-8f65-372c533ad3b7')\"\n",
372
+ " title=\"Suggest charts\"\n",
373
+ " style=\"display:none;\">\n",
374
+ "\n",
375
+ "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
376
+ " width=\"24px\">\n",
377
+ " <g>\n",
378
+ " <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
379
+ " </g>\n",
380
+ "</svg>\n",
381
+ " </button>\n",
382
+ "\n",
383
+ "<style>\n",
384
+ " .colab-df-quickchart {\n",
385
+ " --bg-color: #E8F0FE;\n",
386
+ " --fill-color: #1967D2;\n",
387
+ " --hover-bg-color: #E2EBFA;\n",
388
+ " --hover-fill-color: #174EA6;\n",
389
+ " --disabled-fill-color: #AAA;\n",
390
+ " --disabled-bg-color: #DDD;\n",
391
+ " }\n",
392
+ "\n",
393
+ " [theme=dark] .colab-df-quickchart {\n",
394
+ " --bg-color: #3B4455;\n",
395
+ " --fill-color: #D2E3FC;\n",
396
+ " --hover-bg-color: #434B5C;\n",
397
+ " --hover-fill-color: #FFFFFF;\n",
398
+ " --disabled-bg-color: #3B4455;\n",
399
+ " --disabled-fill-color: #666;\n",
400
+ " }\n",
401
+ "\n",
402
+ " .colab-df-quickchart {\n",
403
+ " background-color: var(--bg-color);\n",
404
+ " border: none;\n",
405
+ " border-radius: 50%;\n",
406
+ " cursor: pointer;\n",
407
+ " display: none;\n",
408
+ " fill: var(--fill-color);\n",
409
+ " height: 32px;\n",
410
+ " padding: 0;\n",
411
+ " width: 32px;\n",
412
+ " }\n",
413
+ "\n",
414
+ " .colab-df-quickchart:hover {\n",
415
+ " background-color: var(--hover-bg-color);\n",
416
+ " box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
417
+ " fill: var(--button-hover-fill-color);\n",
418
+ " }\n",
419
+ "\n",
420
+ " .colab-df-quickchart-complete:disabled,\n",
421
+ " .colab-df-quickchart-complete:disabled:hover {\n",
422
+ " background-color: var(--disabled-bg-color);\n",
423
+ " fill: var(--disabled-fill-color);\n",
424
+ " box-shadow: none;\n",
425
+ " }\n",
426
+ "\n",
427
+ " .colab-df-spinner {\n",
428
+ " border: 2px solid var(--fill-color);\n",
429
+ " border-color: transparent;\n",
430
+ " border-bottom-color: var(--fill-color);\n",
431
+ " animation:\n",
432
+ " spin 1s steps(1) infinite;\n",
433
+ " }\n",
434
+ "\n",
435
+ " @keyframes spin {\n",
436
+ " 0% {\n",
437
+ " border-color: transparent;\n",
438
+ " border-bottom-color: var(--fill-color);\n",
439
+ " border-left-color: var(--fill-color);\n",
440
+ " }\n",
441
+ " 20% {\n",
442
+ " border-color: transparent;\n",
443
+ " border-left-color: var(--fill-color);\n",
444
+ " border-top-color: var(--fill-color);\n",
445
+ " }\n",
446
+ " 30% {\n",
447
+ " border-color: transparent;\n",
448
+ " border-left-color: var(--fill-color);\n",
449
+ " border-top-color: var(--fill-color);\n",
450
+ " border-right-color: var(--fill-color);\n",
451
+ " }\n",
452
+ " 40% {\n",
453
+ " border-color: transparent;\n",
454
+ " border-right-color: var(--fill-color);\n",
455
+ " border-top-color: var(--fill-color);\n",
456
+ " }\n",
457
+ " 60% {\n",
458
+ " border-color: transparent;\n",
459
+ " border-right-color: var(--fill-color);\n",
460
+ " }\n",
461
+ " 80% {\n",
462
+ " border-color: transparent;\n",
463
+ " border-right-color: var(--fill-color);\n",
464
+ " border-bottom-color: var(--fill-color);\n",
465
+ " }\n",
466
+ " 90% {\n",
467
+ " border-color: transparent;\n",
468
+ " border-bottom-color: var(--fill-color);\n",
469
+ " }\n",
470
+ " }\n",
471
+ "</style>\n",
472
+ "\n",
473
+ " <script>\n",
474
+ " async function quickchart(key) {\n",
475
+ " const quickchartButtonEl =\n",
476
+ " document.querySelector('#' + key + ' button');\n",
477
+ " quickchartButtonEl.disabled = true; // To prevent multiple clicks.\n",
478
+ " quickchartButtonEl.classList.add('colab-df-spinner');\n",
479
+ " try {\n",
480
+ " const charts = await google.colab.kernel.invokeFunction(\n",
481
+ " 'suggestCharts', [key], {});\n",
482
+ " } catch (error) {\n",
483
+ " console.error('Error during call to suggestCharts:', error);\n",
484
+ " }\n",
485
+ " quickchartButtonEl.classList.remove('colab-df-spinner');\n",
486
+ " quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
487
+ " }\n",
488
+ " (() => {\n",
489
+ " let quickchartButtonEl =\n",
490
+ " document.querySelector('#df-c3874bd4-bc8e-4fd2-8f65-372c533ad3b7 button');\n",
491
+ " quickchartButtonEl.style.display =\n",
492
+ " google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
493
+ " })();\n",
494
+ " </script>\n",
495
+ "</div>\n",
496
+ "\n",
497
+ " <div id=\"id_a4b005c8-fefc-4810-82ac-33230aa22be4\">\n",
498
+ " <style>\n",
499
+ " .colab-df-generate {\n",
500
+ " background-color: #E8F0FE;\n",
501
+ " border: none;\n",
502
+ " border-radius: 50%;\n",
503
+ " cursor: pointer;\n",
504
+ " display: none;\n",
505
+ " fill: #1967D2;\n",
506
+ " height: 32px;\n",
507
+ " padding: 0 0 0 0;\n",
508
+ " width: 32px;\n",
509
+ " }\n",
510
+ "\n",
511
+ " .colab-df-generate:hover {\n",
512
+ " background-color: #E2EBFA;\n",
513
+ " box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
514
+ " fill: #174EA6;\n",
515
+ " }\n",
516
+ "\n",
517
+ " [theme=dark] .colab-df-generate {\n",
518
+ " background-color: #3B4455;\n",
519
+ " fill: #D2E3FC;\n",
520
+ " }\n",
521
+ "\n",
522
+ " [theme=dark] .colab-df-generate:hover {\n",
523
+ " background-color: #434B5C;\n",
524
+ " box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
525
+ " filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
526
+ " fill: #FFFFFF;\n",
527
+ " }\n",
528
+ " </style>\n",
529
+ " <button class=\"colab-df-generate\" onclick=\"generateWithVariable('df')\"\n",
530
+ " title=\"Generate code using this dataframe.\"\n",
531
+ " style=\"display:none;\">\n",
532
+ "\n",
533
+ " <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
534
+ " width=\"24px\">\n",
535
+ " <path d=\"M7,19H8.4L18.45,9,17,7.55,7,17.6ZM5,21V16.75L18.45,3.32a2,2,0,0,1,2.83,0l1.4,1.43a1.91,1.91,0,0,1,.58,1.4,1.91,1.91,0,0,1-.58,1.4L9.25,21ZM18.45,9,17,7.55Zm-12,3A5.31,5.31,0,0,0,4.9,8.1,5.31,5.31,0,0,0,1,6.5,5.31,5.31,0,0,0,4.9,4.9,5.31,5.31,0,0,0,6.5,1,5.31,5.31,0,0,0,8.1,4.9,5.31,5.31,0,0,0,12,6.5,5.46,5.46,0,0,0,6.5,12Z\"/>\n",
536
+ " </svg>\n",
537
+ " </button>\n",
538
+ " <script>\n",
539
+ " (() => {\n",
540
+ " const buttonEl =\n",
541
+ " document.querySelector('#id_a4b005c8-fefc-4810-82ac-33230aa22be4 button.colab-df-generate');\n",
542
+ " buttonEl.style.display =\n",
543
+ " google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
544
+ "\n",
545
+ " buttonEl.onclick = () => {\n",
546
+ " google.colab.notebook.generateWithVariable('df');\n",
547
+ " }\n",
548
+ " })();\n",
549
+ " </script>\n",
550
+ " </div>\n",
551
+ "\n",
552
+ " </div>\n",
553
+ " </div>\n"
554
+ ],
555
+ "application/vnd.google.colaboratory.intrinsic+json": {
556
+ "type": "dataframe",
557
+ "variable_name": "df",
558
+ "summary": "{\n \"name\": \"df\",\n \"rows\": 55500,\n \"fields\": [\n {\n \"column\": \"Age\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 19,\n \"min\": 13,\n \"max\": 89,\n \"num_unique_values\": 77,\n \"samples\": [\n 43,\n 22,\n 72\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Gender\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 2,\n \"samples\": [\n \"Female\",\n \"Male\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Blood Type\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 8,\n \"samples\": [\n \"A+\",\n \"AB-\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Medical Condition\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 6,\n \"samples\": [\n \"Cancer\",\n \"Obesity\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Test Results\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 3,\n \"samples\": [\n \"Normal\",\n \"Inconclusive\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Medication\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 5,\n \"samples\": [\n \"Ibuprofen\",\n \"Lipitor\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"
559
+ }
560
+ },
561
+ "metadata": {},
562
+ "execution_count": 1
563
+ }
564
+ ]
565
+ },
566
+ {
567
+ "cell_type": "code",
568
+ "source": [
569
+ "df['Test Results'].value_counts()"
570
+ ],
571
+ "metadata": {
572
+ "colab": {
573
+ "base_uri": "https://localhost:8080/",
574
+ "height": 209
575
+ },
576
+ "id": "PUq7tSYfWzTl",
577
+ "outputId": "897d2af8-e1d5-4864-f88f-7c99fd14fed8"
578
+ },
579
+ "execution_count": null,
580
+ "outputs": [
581
+ {
582
+ "output_type": "execute_result",
583
+ "data": {
584
+ "text/plain": [
585
+ "Test Results\n",
586
+ "Abnormal 18627\n",
587
+ "Normal 18517\n",
588
+ "Inconclusive 18356\n",
589
+ "Name: count, dtype: int64"
590
+ ],
591
+ "text/html": [
592
+ "<div>\n",
593
+ "<style scoped>\n",
594
+ " .dataframe tbody tr th:only-of-type {\n",
595
+ " vertical-align: middle;\n",
596
+ " }\n",
597
+ "\n",
598
+ " .dataframe tbody tr th {\n",
599
+ " vertical-align: top;\n",
600
+ " }\n",
601
+ "\n",
602
+ " .dataframe thead th {\n",
603
+ " text-align: right;\n",
604
+ " }\n",
605
+ "</style>\n",
606
+ "<table border=\"1\" class=\"dataframe\">\n",
607
+ " <thead>\n",
608
+ " <tr style=\"text-align: right;\">\n",
609
+ " <th></th>\n",
610
+ " <th>count</th>\n",
611
+ " </tr>\n",
612
+ " <tr>\n",
613
+ " <th>Test Results</th>\n",
614
+ " <th></th>\n",
615
+ " </tr>\n",
616
+ " </thead>\n",
617
+ " <tbody>\n",
618
+ " <tr>\n",
619
+ " <th>Abnormal</th>\n",
620
+ " <td>18627</td>\n",
621
+ " </tr>\n",
622
+ " <tr>\n",
623
+ " <th>Normal</th>\n",
624
+ " <td>18517</td>\n",
625
+ " </tr>\n",
626
+ " <tr>\n",
627
+ " <th>Inconclusive</th>\n",
628
+ " <td>18356</td>\n",
629
+ " </tr>\n",
630
+ " </tbody>\n",
631
+ "</table>\n",
632
+ "</div><br><label><b>dtype:</b> int64</label>"
633
+ ]
634
+ },
635
+ "metadata": {},
636
+ "execution_count": 6
637
+ }
638
+ ]
639
+ },
640
+ {
641
+ "cell_type": "code",
642
+ "source": [
643
+ "df['Medical Condition'].value_counts()"
644
+ ],
645
+ "metadata": {
646
+ "colab": {
647
+ "base_uri": "https://localhost:8080/",
648
+ "height": 303
649
+ },
650
+ "id": "LprQu4JKXg5v",
651
+ "outputId": "3d467787-7747-471b-9bc0-7151943dbef5"
652
+ },
653
+ "execution_count": null,
654
+ "outputs": [
655
+ {
656
+ "output_type": "execute_result",
657
+ "data": {
658
+ "text/plain": [
659
+ "Medical Condition\n",
660
+ "Arthritis 9308\n",
661
+ "Diabetes 9304\n",
662
+ "Hypertension 9245\n",
663
+ "Obesity 9231\n",
664
+ "Cancer 9227\n",
665
+ "Asthma 9185\n",
666
+ "Name: count, dtype: int64"
667
+ ],
668
+ "text/html": [
669
+ "<div>\n",
670
+ "<style scoped>\n",
671
+ " .dataframe tbody tr th:only-of-type {\n",
672
+ " vertical-align: middle;\n",
673
+ " }\n",
674
+ "\n",
675
+ " .dataframe tbody tr th {\n",
676
+ " vertical-align: top;\n",
677
+ " }\n",
678
+ "\n",
679
+ " .dataframe thead th {\n",
680
+ " text-align: right;\n",
681
+ " }\n",
682
+ "</style>\n",
683
+ "<table border=\"1\" class=\"dataframe\">\n",
684
+ " <thead>\n",
685
+ " <tr style=\"text-align: right;\">\n",
686
+ " <th></th>\n",
687
+ " <th>count</th>\n",
688
+ " </tr>\n",
689
+ " <tr>\n",
690
+ " <th>Medical Condition</th>\n",
691
+ " <th></th>\n",
692
+ " </tr>\n",
693
+ " </thead>\n",
694
+ " <tbody>\n",
695
+ " <tr>\n",
696
+ " <th>Arthritis</th>\n",
697
+ " <td>9308</td>\n",
698
+ " </tr>\n",
699
+ " <tr>\n",
700
+ " <th>Diabetes</th>\n",
701
+ " <td>9304</td>\n",
702
+ " </tr>\n",
703
+ " <tr>\n",
704
+ " <th>Hypertension</th>\n",
705
+ " <td>9245</td>\n",
706
+ " </tr>\n",
707
+ " <tr>\n",
708
+ " <th>Obesity</th>\n",
709
+ " <td>9231</td>\n",
710
+ " </tr>\n",
711
+ " <tr>\n",
712
+ " <th>Cancer</th>\n",
713
+ " <td>9227</td>\n",
714
+ " </tr>\n",
715
+ " <tr>\n",
716
+ " <th>Asthma</th>\n",
717
+ " <td>9185</td>\n",
718
+ " </tr>\n",
719
+ " </tbody>\n",
720
+ "</table>\n",
721
+ "</div><br><label><b>dtype:</b> int64</label>"
722
+ ]
723
+ },
724
+ "metadata": {},
725
+ "execution_count": 7
726
+ }
727
+ ]
728
+ },
729
+ {
730
+ "cell_type": "code",
731
+ "source": [
732
+ "df['Blood Type'].value_counts()"
733
+ ],
734
+ "metadata": {
735
+ "colab": {
736
+ "base_uri": "https://localhost:8080/",
737
+ "height": 366
738
+ },
739
+ "id": "PklqIg_QX1LK",
740
+ "outputId": "928ad232-a538-44dd-8568-36c3de23886b"
741
+ },
742
+ "execution_count": null,
743
+ "outputs": [
744
+ {
745
+ "output_type": "execute_result",
746
+ "data": {
747
+ "text/plain": [
748
+ "Blood Type\n",
749
+ "A- 6969\n",
750
+ "A+ 6956\n",
751
+ "AB+ 6947\n",
752
+ "AB- 6945\n",
753
+ "B+ 6945\n",
754
+ "B- 6944\n",
755
+ "O+ 6917\n",
756
+ "O- 6877\n",
757
+ "Name: count, dtype: int64"
758
+ ],
759
+ "text/html": [
760
+ "<div>\n",
761
+ "<style scoped>\n",
762
+ " .dataframe tbody tr th:only-of-type {\n",
763
+ " vertical-align: middle;\n",
764
+ " }\n",
765
+ "\n",
766
+ " .dataframe tbody tr th {\n",
767
+ " vertical-align: top;\n",
768
+ " }\n",
769
+ "\n",
770
+ " .dataframe thead th {\n",
771
+ " text-align: right;\n",
772
+ " }\n",
773
+ "</style>\n",
774
+ "<table border=\"1\" class=\"dataframe\">\n",
775
+ " <thead>\n",
776
+ " <tr style=\"text-align: right;\">\n",
777
+ " <th></th>\n",
778
+ " <th>count</th>\n",
779
+ " </tr>\n",
780
+ " <tr>\n",
781
+ " <th>Blood Type</th>\n",
782
+ " <th></th>\n",
783
+ " </tr>\n",
784
+ " </thead>\n",
785
+ " <tbody>\n",
786
+ " <tr>\n",
787
+ " <th>A-</th>\n",
788
+ " <td>6969</td>\n",
789
+ " </tr>\n",
790
+ " <tr>\n",
791
+ " <th>A+</th>\n",
792
+ " <td>6956</td>\n",
793
+ " </tr>\n",
794
+ " <tr>\n",
795
+ " <th>AB+</th>\n",
796
+ " <td>6947</td>\n",
797
+ " </tr>\n",
798
+ " <tr>\n",
799
+ " <th>AB-</th>\n",
800
+ " <td>6945</td>\n",
801
+ " </tr>\n",
802
+ " <tr>\n",
803
+ " <th>B+</th>\n",
804
+ " <td>6945</td>\n",
805
+ " </tr>\n",
806
+ " <tr>\n",
807
+ " <th>B-</th>\n",
808
+ " <td>6944</td>\n",
809
+ " </tr>\n",
810
+ " <tr>\n",
811
+ " <th>O+</th>\n",
812
+ " <td>6917</td>\n",
813
+ " </tr>\n",
814
+ " <tr>\n",
815
+ " <th>O-</th>\n",
816
+ " <td>6877</td>\n",
817
+ " </tr>\n",
818
+ " </tbody>\n",
819
+ "</table>\n",
820
+ "</div><br><label><b>dtype:</b> int64</label>"
821
+ ]
822
+ },
823
+ "metadata": {},
824
+ "execution_count": 8
825
+ }
826
+ ]
827
+ },
828
+ {
829
+ "cell_type": "code",
830
+ "source": [
831
+ "df['Medication'].value_counts()"
832
+ ],
833
+ "metadata": {
834
+ "colab": {
835
+ "base_uri": "https://localhost:8080/",
836
+ "height": 272
837
+ },
838
+ "id": "QGBA7xBnX8zA",
839
+ "outputId": "4bfd0a84-e68a-4553-d918-2d684dde6dc9"
840
+ },
841
+ "execution_count": null,
842
+ "outputs": [
843
+ {
844
+ "output_type": "execute_result",
845
+ "data": {
846
+ "text/plain": [
847
+ "Medication\n",
848
+ "Lipitor 11140\n",
849
+ "Ibuprofen 11127\n",
850
+ "Aspirin 11094\n",
851
+ "Paracetamol 11071\n",
852
+ "Penicillin 11068\n",
853
+ "Name: count, dtype: int64"
854
+ ],
855
+ "text/html": [
856
+ "<div>\n",
857
+ "<style scoped>\n",
858
+ " .dataframe tbody tr th:only-of-type {\n",
859
+ " vertical-align: middle;\n",
860
+ " }\n",
861
+ "\n",
862
+ " .dataframe tbody tr th {\n",
863
+ " vertical-align: top;\n",
864
+ " }\n",
865
+ "\n",
866
+ " .dataframe thead th {\n",
867
+ " text-align: right;\n",
868
+ " }\n",
869
+ "</style>\n",
870
+ "<table border=\"1\" class=\"dataframe\">\n",
871
+ " <thead>\n",
872
+ " <tr style=\"text-align: right;\">\n",
873
+ " <th></th>\n",
874
+ " <th>count</th>\n",
875
+ " </tr>\n",
876
+ " <tr>\n",
877
+ " <th>Medication</th>\n",
878
+ " <th></th>\n",
879
+ " </tr>\n",
880
+ " </thead>\n",
881
+ " <tbody>\n",
882
+ " <tr>\n",
883
+ " <th>Lipitor</th>\n",
884
+ " <td>11140</td>\n",
885
+ " </tr>\n",
886
+ " <tr>\n",
887
+ " <th>Ibuprofen</th>\n",
888
+ " <td>11127</td>\n",
889
+ " </tr>\n",
890
+ " <tr>\n",
891
+ " <th>Aspirin</th>\n",
892
+ " <td>11094</td>\n",
893
+ " </tr>\n",
894
+ " <tr>\n",
895
+ " <th>Paracetamol</th>\n",
896
+ " <td>11071</td>\n",
897
+ " </tr>\n",
898
+ " <tr>\n",
899
+ " <th>Penicillin</th>\n",
900
+ " <td>11068</td>\n",
901
+ " </tr>\n",
902
+ " </tbody>\n",
903
+ "</table>\n",
904
+ "</div><br><label><b>dtype:</b> int64</label>"
905
+ ]
906
+ },
907
+ "metadata": {},
908
+ "execution_count": 9
909
+ }
910
+ ]
911
+ },
912
+ {
913
+ "cell_type": "code",
914
+ "source": [
915
+ "df['Gender'].value_counts()"
916
+ ],
917
+ "metadata": {
918
+ "colab": {
919
+ "base_uri": "https://localhost:8080/",
920
+ "height": 178
921
+ },
922
+ "id": "kZV7YperYB4s",
923
+ "outputId": "c0a79c87-9f71-4fd0-dbb7-542b946f4490"
924
+ },
925
+ "execution_count": null,
926
+ "outputs": [
927
+ {
928
+ "output_type": "execute_result",
929
+ "data": {
930
+ "text/plain": [
931
+ "Gender\n",
932
+ "Male 27774\n",
933
+ "Female 27726\n",
934
+ "Name: count, dtype: int64"
935
+ ],
936
+ "text/html": [
937
+ "<div>\n",
938
+ "<style scoped>\n",
939
+ " .dataframe tbody tr th:only-of-type {\n",
940
+ " vertical-align: middle;\n",
941
+ " }\n",
942
+ "\n",
943
+ " .dataframe tbody tr th {\n",
944
+ " vertical-align: top;\n",
945
+ " }\n",
946
+ "\n",
947
+ " .dataframe thead th {\n",
948
+ " text-align: right;\n",
949
+ " }\n",
950
+ "</style>\n",
951
+ "<table border=\"1\" class=\"dataframe\">\n",
952
+ " <thead>\n",
953
+ " <tr style=\"text-align: right;\">\n",
954
+ " <th></th>\n",
955
+ " <th>count</th>\n",
956
+ " </tr>\n",
957
+ " <tr>\n",
958
+ " <th>Gender</th>\n",
959
+ " <th></th>\n",
960
+ " </tr>\n",
961
+ " </thead>\n",
962
+ " <tbody>\n",
963
+ " <tr>\n",
964
+ " <th>Male</th>\n",
965
+ " <td>27774</td>\n",
966
+ " </tr>\n",
967
+ " <tr>\n",
968
+ " <th>Female</th>\n",
969
+ " <td>27726</td>\n",
970
+ " </tr>\n",
971
+ " </tbody>\n",
972
+ "</table>\n",
973
+ "</div><br><label><b>dtype:</b> int64</label>"
974
+ ]
975
+ },
976
+ "metadata": {},
977
+ "execution_count": 10
978
+ }
979
+ ]
980
+ },
981
+ {
982
+ "cell_type": "code",
983
+ "source": [
984
+ "df.isnull().sum()"
985
+ ],
986
+ "metadata": {
987
+ "colab": {
988
+ "base_uri": "https://localhost:8080/",
989
+ "height": 272
990
+ },
991
+ "id": "inBn2HEPYKBk",
992
+ "outputId": "6fd328f2-e84d-47db-df61-3468983ce528"
993
+ },
994
+ "execution_count": null,
995
+ "outputs": [
996
+ {
997
+ "output_type": "execute_result",
998
+ "data": {
999
+ "text/plain": [
1000
+ "Age 0\n",
1001
+ "Gender 0\n",
1002
+ "Blood Type 0\n",
1003
+ "Medical Condition 0\n",
1004
+ "Test Results 0\n",
1005
+ "Medication 0\n",
1006
+ "dtype: int64"
1007
+ ],
1008
+ "text/html": [
1009
+ "<div>\n",
1010
+ "<style scoped>\n",
1011
+ " .dataframe tbody tr th:only-of-type {\n",
1012
+ " vertical-align: middle;\n",
1013
+ " }\n",
1014
+ "\n",
1015
+ " .dataframe tbody tr th {\n",
1016
+ " vertical-align: top;\n",
1017
+ " }\n",
1018
+ "\n",
1019
+ " .dataframe thead th {\n",
1020
+ " text-align: right;\n",
1021
+ " }\n",
1022
+ "</style>\n",
1023
+ "<table border=\"1\" class=\"dataframe\">\n",
1024
+ " <thead>\n",
1025
+ " <tr style=\"text-align: right;\">\n",
1026
+ " <th></th>\n",
1027
+ " <th>0</th>\n",
1028
+ " </tr>\n",
1029
+ " </thead>\n",
1030
+ " <tbody>\n",
1031
+ " <tr>\n",
1032
+ " <th>Age</th>\n",
1033
+ " <td>0</td>\n",
1034
+ " </tr>\n",
1035
+ " <tr>\n",
1036
+ " <th>Gender</th>\n",
1037
+ " <td>0</td>\n",
1038
+ " </tr>\n",
1039
+ " <tr>\n",
1040
+ " <th>Blood Type</th>\n",
1041
+ " <td>0</td>\n",
1042
+ " </tr>\n",
1043
+ " <tr>\n",
1044
+ " <th>Medical Condition</th>\n",
1045
+ " <td>0</td>\n",
1046
+ " </tr>\n",
1047
+ " <tr>\n",
1048
+ " <th>Test Results</th>\n",
1049
+ " <td>0</td>\n",
1050
+ " </tr>\n",
1051
+ " <tr>\n",
1052
+ " <th>Medication</th>\n",
1053
+ " <td>0</td>\n",
1054
+ " </tr>\n",
1055
+ " </tbody>\n",
1056
+ "</table>\n",
1057
+ "</div><br><label><b>dtype:</b> int64</label>"
1058
+ ]
1059
+ },
1060
+ "metadata": {},
1061
+ "execution_count": 11
1062
+ }
1063
+ ]
1064
+ },
1065
+ {
1066
+ "cell_type": "code",
1067
+ "source": [
1068
+ "from sklearn.preprocessing import LabelEncoder\n",
1069
+ "\n",
1070
+ "# Encode categorical features\n",
1071
+ "label_encoders = {}\n",
1072
+ "for column in ['Gender', 'Blood Type', 'Medical Condition', 'Test Results']:\n",
1073
+ " le = LabelEncoder()\n",
1074
+ " df[column] = le.fit_transform(df[column])\n",
1075
+ " label_encoders[column] = le\n",
1076
+ "\n",
1077
+ "# Encode the target variable\n",
1078
+ "target_encoder = LabelEncoder()\n",
1079
+ "df['Medication'] = target_encoder.fit_transform(df['Medication'])"
1080
+ ],
1081
+ "metadata": {
1082
+ "id": "5EDh_scLZF_N"
1083
+ },
1084
+ "execution_count": null,
1085
+ "outputs": []
1086
+ },
1087
+ {
1088
+ "cell_type": "code",
1089
+ "source": [
1090
+ "from sklearn.model_selection import train_test_split\n",
1091
+ "\n",
1092
+ "# Define features and target\n",
1093
+ "X = df[['Age', 'Gender', 'Blood Type', 'Medical Condition', 'Test Results']]\n",
1094
+ "y = df['Medication']\n",
1095
+ "\n",
1096
+ "# Split the data\n",
1097
+ "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)"
1098
+ ],
1099
+ "metadata": {
1100
+ "id": "NRwGc4aQZMP0"
1101
+ },
1102
+ "execution_count": null,
1103
+ "outputs": []
1104
+ },
1105
+ {
1106
+ "cell_type": "code",
1107
+ "source": [
1108
+ "len(X_train), len(X_test), len(y_train), len(y_test)"
1109
+ ],
1110
+ "metadata": {
1111
+ "colab": {
1112
+ "base_uri": "https://localhost:8080/"
1113
+ },
1114
+ "id": "rpcJAbA_ZeN-",
1115
+ "outputId": "01fcf1b0-5b45-4dbb-ee95-57e9361e2f91"
1116
+ },
1117
+ "execution_count": null,
1118
+ "outputs": [
1119
+ {
1120
+ "output_type": "execute_result",
1121
+ "data": {
1122
+ "text/plain": [
1123
+ "(44400, 11100, 44400, 11100)"
1124
+ ]
1125
+ },
1126
+ "metadata": {},
1127
+ "execution_count": 4
1128
+ }
1129
+ ]
1130
+ },
1131
+ {
1132
+ "cell_type": "markdown",
1133
+ "source": [
1134
+ "### Model Training"
1135
+ ],
1136
+ "metadata": {
1137
+ "id": "zWz1-JCKnudh"
1138
+ }
1139
+ },
1140
+ {
1141
+ "cell_type": "code",
1142
+ "source": [
1143
+ "from sklearn.ensemble import RandomForestClassifier\n",
1144
+ "from sklearn.metrics import classification_report, accuracy_score\n",
1145
+ "\n",
1146
+ "# Initialize and train the model\n",
1147
+ "model = RandomForestClassifier(n_estimators=100, random_state=42)\n",
1148
+ "model.fit(X_train, y_train)\n",
1149
+ "\n",
1150
+ "# Make predictions\n",
1151
+ "y_pred = model.predict(X_test)\n",
1152
+ "\n",
1153
+ "# Evaluate the model\n",
1154
+ "print(f\"Accuracy: {accuracy_score(y_test, y_pred)}\")\n",
1155
+ "print(classification_report(y_test, y_pred, target_names=target_encoder.classes_))\n"
1156
+ ],
1157
+ "metadata": {
1158
+ "colab": {
1159
+ "base_uri": "https://localhost:8080/"
1160
+ },
1161
+ "id": "aOoTscBpZYVF",
1162
+ "outputId": "b4b4b35d-1e42-4457-ddbb-2ea03f0183c8"
1163
+ },
1164
+ "execution_count": null,
1165
+ "outputs": [
1166
+ {
1167
+ "output_type": "stream",
1168
+ "name": "stdout",
1169
+ "text": [
1170
+ "Accuracy: 0.2036036036036036\n",
1171
+ " precision recall f1-score support\n",
1172
+ "\n",
1173
+ " Aspirin 0.20 0.20 0.20 2211\n",
1174
+ " Ibuprofen 0.21 0.20 0.21 2271\n",
1175
+ " Lipitor 0.21 0.21 0.21 2224\n",
1176
+ " Paracetamol 0.21 0.21 0.21 2207\n",
1177
+ " Penicillin 0.19 0.19 0.19 2187\n",
1178
+ "\n",
1179
+ " accuracy 0.20 11100\n",
1180
+ " macro avg 0.20 0.20 0.20 11100\n",
1181
+ "weighted avg 0.20 0.20 0.20 11100\n",
1182
+ "\n"
1183
+ ]
1184
+ }
1185
+ ]
1186
+ },
1187
+ {
1188
+ "cell_type": "code",
1189
+ "source": [
1190
+ "from sklearn.preprocessing import StandardScaler\n",
1191
+ "from tensorflow.keras.utils import to_categorical\n",
1192
+ "\n",
1193
+ "# Normalize numerical features\n",
1194
+ "scaler = StandardScaler()\n",
1195
+ "X_scaled = scaler.fit_transform(X[['Age']])\n",
1196
+ "X_scaled = pd.DataFrame(X_scaled, columns=['Age'])\n",
1197
+ "\n",
1198
+ "# Concatenate scaled numerical features with encoded categorical features\n",
1199
+ "X_encoded = X.drop(columns=['Age'])\n",
1200
+ "X_final = pd.concat([X_scaled, X_encoded], axis=1)\n",
1201
+ "\n",
1202
+ "# One-hot encode the target variable\n",
1203
+ "y_final = to_categorical(y)\n"
1204
+ ],
1205
+ "metadata": {
1206
+ "id": "T_kRZhaQat3s"
1207
+ },
1208
+ "execution_count": null,
1209
+ "outputs": []
1210
+ },
1211
+ {
1212
+ "cell_type": "code",
1213
+ "source": [
1214
+ "from sklearn.neighbors import KNeighborsClassifier\n",
1215
+ "from sklearn.metrics import classification_report, accuracy_score\n",
1216
+ "\n",
1217
+ "# Initialize the KNN model\n",
1218
+ "knn = KNeighborsClassifier(n_neighbors=5) # You can adjust n_neighbors for better performance\n",
1219
+ "\n",
1220
+ "# Train the model\n",
1221
+ "knn.fit(X_train, y_train)\n",
1222
+ "\n",
1223
+ "# Predict on the test set\n",
1224
+ "y_pred = knn.predict(X_test)\n",
1225
+ "\n",
1226
+ "# Evaluate the model\n",
1227
+ "accuracy = accuracy_score(y_test, y_pred)\n",
1228
+ "print(f\"Test Accuracy: {accuracy}\")\n",
1229
+ "print(classification_report(y_test, y_pred, target_names=target_encoder.classes_))\n"
1230
+ ],
1231
+ "metadata": {
1232
+ "colab": {
1233
+ "base_uri": "https://localhost:8080/"
1234
+ },
1235
+ "id": "6jp6Gcqth5cB",
1236
+ "outputId": "b34b1aba-d90b-4b5a-f0bd-a51a8a0c6015"
1237
+ },
1238
+ "execution_count": null,
1239
+ "outputs": [
1240
+ {
1241
+ "output_type": "stream",
1242
+ "name": "stdout",
1243
+ "text": [
1244
+ "Test Accuracy: 0.2018018018018018\n",
1245
+ " precision recall f1-score support\n",
1246
+ "\n",
1247
+ " Aspirin 0.19 0.29 0.23 2211\n",
1248
+ " Ibuprofen 0.21 0.23 0.22 2271\n",
1249
+ " Lipitor 0.21 0.20 0.20 2224\n",
1250
+ " Paracetamol 0.20 0.16 0.18 2207\n",
1251
+ " Penicillin 0.20 0.13 0.16 2187\n",
1252
+ "\n",
1253
+ " accuracy 0.20 11100\n",
1254
+ " macro avg 0.20 0.20 0.20 11100\n",
1255
+ "weighted avg 0.20 0.20 0.20 11100\n",
1256
+ "\n"
1257
+ ]
1258
+ }
1259
+ ]
1260
+ },
1261
+ {
1262
+ "cell_type": "markdown",
1263
+ "source": [
1264
+ "### FINAL"
1265
+ ],
1266
+ "metadata": {
1267
+ "id": "gFUsQMWP87EE"
1268
+ }
1269
+ },
1270
+ {
1271
+ "cell_type": "code",
1272
+ "source": [
1273
+ "\n",
1274
+ "import pandas as pd\n",
1275
+ "from sklearn.preprocessing import LabelEncoder, StandardScaler\n",
1276
+ "from sklearn.neighbors import KNeighborsClassifier\n",
1277
+ "from sklearn.model_selection import train_test_split\n",
1278
+ "from sklearn.metrics import accuracy_score, classification_report\n",
1279
+ "import joblib\n",
1280
+ "\n",
1281
+ "# Load the dataset\n",
1282
+ "data = pd.read_csv('/content/healthcare_dataset.csv')\n",
1283
+ "\n",
1284
+ "# If 'Medication' column is numeric, manually map them to their names\n",
1285
+ "medication_mapping = {\n",
1286
+ " 0: 'Aspirin',\n",
1287
+ " 1: 'Ibuprofen',\n",
1288
+ " 2: 'Lipitor',\n",
1289
+ " 3: 'Paracetamol',\n",
1290
+ " 4: 'Penicillin'\n",
1291
+ "}\n",
1292
+ "\n",
1293
+ "# Encode categorical features\n",
1294
+ "label_encoders = {}\n",
1295
+ "for column in ['Gender', 'Blood Type', 'Medical Condition', 'Test Results']:\n",
1296
+ " le = LabelEncoder()\n",
1297
+ " data[column] = le.fit_transform(data[column])\n",
1298
+ " label_encoders[column] = le\n",
1299
+ "\n",
1300
+ "# Encode the target variable 'Medication'\n",
1301
+ "medication_encoder = LabelEncoder()\n",
1302
+ "data['Medication'] = medication_encoder.fit_transform(data['Medication'])\n",
1303
+ "\n",
1304
+ "# Define features and target\n",
1305
+ "X = data[['Age', 'Gender', 'Blood Type', 'Medical Condition', 'Test Results']]\n",
1306
+ "y = data['Medication']\n",
1307
+ "\n",
1308
+ "# Split the dataset into training and testing sets\n",
1309
+ "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)\n",
1310
+ "\n",
1311
+ "# Normalize ONLY the 'Age' column\n",
1312
+ "age_scaler = StandardScaler()\n",
1313
+ "X_train['Age'] = age_scaler.fit_transform(X_train[['Age']])\n",
1314
+ "X_test['Age'] = age_scaler.transform(X_test[['Age']])"
1315
+ ],
1316
+ "metadata": {
1317
+ "id": "dMaqw6Ao7iJC"
1318
+ },
1319
+ "execution_count": null,
1320
+ "outputs": []
1321
+ },
1322
+ {
1323
+ "cell_type": "code",
1324
+ "source": [
1325
+ "# Initialize and train the KNN model\n",
1326
+ "knn = KNeighborsClassifier(n_neighbors=5)\n",
1327
+ "knn.fit(X_train, y_train)\n",
1328
+ "\n",
1329
+ "# Evaluate the model on the test set\n",
1330
+ "y_pred = knn.predict(X_test)\n",
1331
+ "accuracy = accuracy_score(y_test, y_pred)\n",
1332
+ "print(f\"Test Accuracy: {accuracy}\")\n",
1333
+ "\n",
1334
+ "# Print the classification report\n",
1335
+ "print(\"Classification Report:\")\n",
1336
+ "print(classification_report(y_test, y_pred, target_names=medication_encoder.classes_))\n"
1337
+ ],
1338
+ "metadata": {
1339
+ "colab": {
1340
+ "base_uri": "https://localhost:8080/"
1341
+ },
1342
+ "id": "ux4L1tsX9CS2",
1343
+ "outputId": "52e4b74c-22ec-4934-f80d-f5e37b893326"
1344
+ },
1345
+ "execution_count": null,
1346
+ "outputs": [
1347
+ {
1348
+ "output_type": "stream",
1349
+ "name": "stdout",
1350
+ "text": [
1351
+ "Test Accuracy: 0.20306306306306307\n",
1352
+ "Classification Report:\n",
1353
+ " precision recall f1-score support\n",
1354
+ "\n",
1355
+ " Aspirin 0.20 0.29 0.24 2211\n",
1356
+ " Ibuprofen 0.21 0.23 0.22 2271\n",
1357
+ " Lipitor 0.22 0.21 0.21 2224\n",
1358
+ " Paracetamol 0.20 0.16 0.17 2207\n",
1359
+ " Penicillin 0.18 0.13 0.15 2187\n",
1360
+ "\n",
1361
+ " accuracy 0.20 11100\n",
1362
+ " macro avg 0.20 0.20 0.20 11100\n",
1363
+ "weighted avg 0.20 0.20 0.20 11100\n",
1364
+ "\n"
1365
+ ]
1366
+ }
1367
+ ]
1368
+ },
1369
+ {
1370
+ "cell_type": "markdown",
1371
+ "source": [
1372
+ "### Testing"
1373
+ ],
1374
+ "metadata": {
1375
+ "id": "6TbYU2UKn0DJ"
1376
+ }
1377
+ },
1378
+ {
1379
+ "cell_type": "code",
1380
+ "source": [
1381
+ "# Example new data for prediction\n",
1382
+ "new_data = pd.DataFrame({\n",
1383
+ " 'Age': [62],\n",
1384
+ " 'Gender': ['Male'],\n",
1385
+ " 'Blood Type': ['A+'],\n",
1386
+ " 'Medical Condition': ['Obesity'],\n",
1387
+ " 'Test Results': ['Normal']\n",
1388
+ "})\n",
1389
+ "\n",
1390
+ "# Encode the new data using the same label encoders\n",
1391
+ "for column in ['Gender', 'Blood Type', 'Medical Condition', 'Test Results']:\n",
1392
+ " new_data[column] = label_encoders[column].transform(new_data[column])\n",
1393
+ "\n",
1394
+ "# Normalize the 'Age' column in the new data\n",
1395
+ "new_data['Age'] = age_scaler.transform(new_data[['Age']])\n",
1396
+ "\n",
1397
+ "# Make predictions\n",
1398
+ "predictions = knn.predict(new_data)\n",
1399
+ "\n",
1400
+ "# Decode the predictions back to the original medication names\n",
1401
+ "predicted_medications = medication_encoder.inverse_transform(predictions)\n",
1402
+ "\n",
1403
+ "print(f\"Predicted Medication: {predicted_medications[0]}\")\n"
1404
+ ],
1405
+ "metadata": {
1406
+ "colab": {
1407
+ "base_uri": "https://localhost:8080/"
1408
+ },
1409
+ "id": "ubmJkLPj9ELT",
1410
+ "outputId": "aff25ba4-1459-47a1-e813-257a0faad04a"
1411
+ },
1412
+ "execution_count": null,
1413
+ "outputs": [
1414
+ {
1415
+ "output_type": "stream",
1416
+ "name": "stdout",
1417
+ "text": [
1418
+ "Predicted Medication: Ibuprofen\n"
1419
+ ]
1420
+ }
1421
+ ]
1422
+ },
1423
+ {
1424
+ "cell_type": "markdown",
1425
+ "source": [
1426
+ "### Saving"
1427
+ ],
1428
+ "metadata": {
1429
+ "id": "qyMS8mQnn2Dx"
1430
+ }
1431
+ },
1432
+ {
1433
+ "cell_type": "code",
1434
+ "source": [
1435
+ "# Save the trained model, label encoders, and age scaler\n",
1436
+ "joblib.dump(knn, 'knn_model.pkl')\n",
1437
+ "joblib.dump(label_encoders, 'label_encoders.pkl')\n",
1438
+ "joblib.dump(age_scaler, 'age_scaler.pkl')\n",
1439
+ "joblib.dump(medication_encoder, 'medication_encoder.pkl')\n",
1440
+ "\n",
1441
+ "print(\"Model and encoders saved successfully.\")\n"
1442
+ ],
1443
+ "metadata": {
1444
+ "colab": {
1445
+ "base_uri": "https://localhost:8080/"
1446
+ },
1447
+ "id": "WOitiTRa9Gxa",
1448
+ "outputId": "61bbdb60-b67f-4719-e0be-bf78b88df92b"
1449
+ },
1450
+ "execution_count": null,
1451
+ "outputs": [
1452
+ {
1453
+ "output_type": "stream",
1454
+ "name": "stdout",
1455
+ "text": [
1456
+ "Model and encoders saved successfully.\n"
1457
+ ]
1458
+ }
1459
+ ]
1460
+ }
1461
+ ]
1462
+ }