{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Student Performance Indicator\n",
    "- Understanding the problem statement\n",
    "- Data Collection'\n",
    "- Data Checks\n",
    "- Exploratory Data Analysis\n",
    "- Data Pre-processing\n",
    "- Model Training\n",
    "- Choose the best model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1) Problem Statement\n",
    "- This project understands how the student's performace (test score) is affected by other variables such gender, ethinicity, parental education, test preparation course.\n",
    "\n",
    "### 2) Data Collection\n",
    "- Data Source - https://www.kaggle.com/datasets/spscientist/students-performance-in-exams?datasetId=74977\n",
    "- The data consist of 8 columns and 1000 rows.\n",
    "\n",
    "#### 2.1 Import Data and Required Packages"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt\n",
    "%matplotlib inline\n",
    "import warnings\n",
    "warnings.filterwarnings('ignore')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Import CSV data as Pandas DataFrame"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = pd.read_csv('data/stud.csv')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Show top and bottom records"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>gender</th>\n",
       "      <th>race_ethnicity</th>\n",
       "      <th>parental_level_of_education</th>\n",
       "      <th>lunch</th>\n",
       "      <th>test_preparation_course</th>\n",
       "      <th>math_score</th>\n",
       "      <th>reading_score</th>\n",
       "      <th>writing_score</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>female</td>\n",
       "      <td>group B</td>\n",
       "      <td>bachelor's degree</td>\n",
       "      <td>standard</td>\n",
       "      <td>none</td>\n",
       "      <td>72</td>\n",
       "      <td>72</td>\n",
       "      <td>74</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>female</td>\n",
       "      <td>group C</td>\n",
       "      <td>some college</td>\n",
       "      <td>standard</td>\n",
       "      <td>completed</td>\n",
       "      <td>69</td>\n",
       "      <td>90</td>\n",
       "      <td>88</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>female</td>\n",
       "      <td>group B</td>\n",
       "      <td>master's degree</td>\n",
       "      <td>standard</td>\n",
       "      <td>none</td>\n",
       "      <td>90</td>\n",
       "      <td>95</td>\n",
       "      <td>93</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>male</td>\n",
       "      <td>group A</td>\n",
       "      <td>associate's degree</td>\n",
       "      <td>free/reduced</td>\n",
       "      <td>none</td>\n",
       "      <td>47</td>\n",
       "      <td>57</td>\n",
       "      <td>44</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>male</td>\n",
       "      <td>group C</td>\n",
       "      <td>some college</td>\n",
       "      <td>standard</td>\n",
       "      <td>none</td>\n",
       "      <td>76</td>\n",
       "      <td>78</td>\n",
       "      <td>75</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   gender race_ethnicity parental_level_of_education         lunch  \\\n",
       "0  female        group B           bachelor's degree      standard   \n",
       "1  female        group C                some college      standard   \n",
       "2  female        group B             master's degree      standard   \n",
       "3    male        group A          associate's degree  free/reduced   \n",
       "4    male        group C                some college      standard   \n",
       "\n",
       "  test_preparation_course  math_score  reading_score  writing_score  \n",
       "0                    none          72             72             74  \n",
       "1               completed          69             90             88  \n",
       "2                    none          90             95             93  \n",
       "3                    none          47             57             44  \n",
       "4                    none          76             78             75  "
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.head(5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>gender</th>\n",
       "      <th>race_ethnicity</th>\n",
       "      <th>parental_level_of_education</th>\n",
       "      <th>lunch</th>\n",
       "      <th>test_preparation_course</th>\n",
       "      <th>math_score</th>\n",
       "      <th>reading_score</th>\n",
       "      <th>writing_score</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>995</th>\n",
       "      <td>female</td>\n",
       "      <td>group E</td>\n",
       "      <td>master's degree</td>\n",
       "      <td>standard</td>\n",
       "      <td>completed</td>\n",
       "      <td>88</td>\n",
       "      <td>99</td>\n",
       "      <td>95</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>996</th>\n",
       "      <td>male</td>\n",
       "      <td>group C</td>\n",
       "      <td>high school</td>\n",
       "      <td>free/reduced</td>\n",
       "      <td>none</td>\n",
       "      <td>62</td>\n",
       "      <td>55</td>\n",
       "      <td>55</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>997</th>\n",
       "      <td>female</td>\n",
       "      <td>group C</td>\n",
       "      <td>high school</td>\n",
       "      <td>free/reduced</td>\n",
       "      <td>completed</td>\n",
       "      <td>59</td>\n",
       "      <td>71</td>\n",
       "      <td>65</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>998</th>\n",
       "      <td>female</td>\n",
       "      <td>group D</td>\n",
       "      <td>some college</td>\n",
       "      <td>standard</td>\n",
       "      <td>completed</td>\n",
       "      <td>68</td>\n",
       "      <td>78</td>\n",
       "      <td>77</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>999</th>\n",
       "      <td>female</td>\n",
       "      <td>group D</td>\n",
       "      <td>some college</td>\n",
       "      <td>free/reduced</td>\n",
       "      <td>none</td>\n",
       "      <td>77</td>\n",
       "      <td>86</td>\n",
       "      <td>86</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     gender race_ethnicity parental_level_of_education         lunch  \\\n",
       "995  female        group E             master's degree      standard   \n",
       "996    male        group C                 high school  free/reduced   \n",
       "997  female        group C                 high school  free/reduced   \n",
       "998  female        group D                some college      standard   \n",
       "999  female        group D                some college  free/reduced   \n",
       "\n",
       "    test_preparation_course  math_score  reading_score  writing_score  \n",
       "995               completed          88             99             95  \n",
       "996                    none          62             55             55  \n",
       "997               completed          59             71             65  \n",
       "998               completed          68             78             77  \n",
       "999                    none          77             86             86  "
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.tail(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 2.2 Data Information\n",
    "\n",
    "\n",
    "- gender: sex of students\n",
    "- race_ethinicity: ethinicity of students\n",
    "- parental_level_of_education: parents' highest education\n",
    "- lunch: having lunch before test\n",
    "- test preparation course: completed of not before test\n",
    "- math score,  reading score, writing score"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "shape of the dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(1000, 8)"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3) Data Checks\n",
    "- check missing values\n",
    "- check duplicates\n",
    "- Check Datatypes\n",
    "- Check the number of unique values of each column\n",
    "- Check Statistics of dataset\n",
    "- check various categories present in the different categorical columns"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 3.1 Missing values"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "gender                         0.0\n",
       "race_ethnicity                 0.0\n",
       "parental_level_of_education    0.0\n",
       "lunch                          0.0\n",
       "test_preparation_course        0.0\n",
       "math_score                     0.0\n",
       "reading_score                  0.0\n",
       "writing_score                  0.0\n",
       "dtype: float64"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.isnull().mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### observation \n",
    "- no missing values"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 3.2 Check Duplicates"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.duplicated().sum()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### observation\n",
    "- no duplicates"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 3.3 Check Data Types"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 1000 entries, 0 to 999\n",
      "Data columns (total 8 columns):\n",
      " #   Column                       Non-Null Count  Dtype \n",
      "---  ------                       --------------  ----- \n",
      " 0   gender                       1000 non-null   object\n",
      " 1   race_ethnicity               1000 non-null   object\n",
      " 2   parental_level_of_education  1000 non-null   object\n",
      " 3   lunch                        1000 non-null   object\n",
      " 4   test_preparation_course      1000 non-null   object\n",
      " 5   math_score                   1000 non-null   int64 \n",
      " 6   reading_score                1000 non-null   int64 \n",
      " 7   writing_score                1000 non-null   int64 \n",
      "dtypes: int64(3), object(5)\n",
      "memory usage: 62.6+ KB\n"
     ]
    }
   ],
   "source": [
    "df.info()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### observation\n",
    "- 5 categorical columns\n",
    "- 3 numerical columns"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.4 Number of Unique Values"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "gender                          2\n",
       "race_ethnicity                  5\n",
       "parental_level_of_education     6\n",
       "lunch                           2\n",
       "test_preparation_course         2\n",
       "math_score                     81\n",
       "reading_score                  72\n",
       "writing_score                  77\n",
       "dtype: int64"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.nunique()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### observation\n",
    "- gender, lunch and test_preparation_course are binary columns i.e. 2 unique values.\n",
    "- race_ethinicity has 5 unique and parental_level_education has 6 unique values"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.5 Check Statistics of dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>count</th>\n",
       "      <th>mean</th>\n",
       "      <th>std</th>\n",
       "      <th>min</th>\n",
       "      <th>25%</th>\n",
       "      <th>50%</th>\n",
       "      <th>75%</th>\n",
       "      <th>max</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>math_score</th>\n",
       "      <td>1000.0</td>\n",
       "      <td>66.089</td>\n",
       "      <td>15.163080</td>\n",
       "      <td>0.0</td>\n",
       "      <td>57.00</td>\n",
       "      <td>66.0</td>\n",
       "      <td>77.0</td>\n",
       "      <td>100.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>reading_score</th>\n",
       "      <td>1000.0</td>\n",
       "      <td>69.169</td>\n",
       "      <td>14.600192</td>\n",
       "      <td>17.0</td>\n",
       "      <td>59.00</td>\n",
       "      <td>70.0</td>\n",
       "      <td>79.0</td>\n",
       "      <td>100.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>writing_score</th>\n",
       "      <td>1000.0</td>\n",
       "      <td>68.054</td>\n",
       "      <td>15.195657</td>\n",
       "      <td>10.0</td>\n",
       "      <td>57.75</td>\n",
       "      <td>69.0</td>\n",
       "      <td>79.0</td>\n",
       "      <td>100.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                count    mean        std   min    25%   50%   75%    max\n",
       "math_score     1000.0  66.089  15.163080   0.0  57.00  66.0  77.0  100.0\n",
       "reading_score  1000.0  69.169  14.600192  17.0  59.00  70.0  79.0  100.0\n",
       "writing_score  1000.0  68.054  15.195657  10.0  57.75  69.0  79.0  100.0"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.describe().T"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### observation\n",
    "- math_score has a minimum score of zero.\n",
    "- distribution of three columns are similar in nature."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "categorical_cols: ['gender', 'race_ethnicity', 'parental_level_of_education', 'lunch', 'test_preparation_course']\n",
      "length of categorical columns: 5\n",
      "\n",
      "numerical_cols: ['math_score', 'reading_score', 'writing_score']\n",
      "length of numerical columns: 3\n"
     ]
    }
   ],
   "source": [
    "## define categorical and numerical columns\n",
    "\n",
    "categorical_cols = [col for col in df.columns if df[col].dtype=='O']\n",
    "numerical_cols = [col for col in df.columns if col not in categorical_cols]\n",
    "\n",
    "print(f\"categorical_cols: {categorical_cols}\")\n",
    "print(f\"length of categorical columns: {len(categorical_cols)}\\n\")\n",
    "\n",
    "print(f\"numerical_cols: {numerical_cols}\")\n",
    "print(f\"length of numerical columns: {len(numerical_cols)}\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 3.6 Categories of Categorical Columns\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Categories of gender ['female' 'male']: \n",
      "\n",
      "Categories of race_ethnicity ['group B' 'group C' 'group A' 'group D' 'group E']: \n",
      "\n",
      "Categories of parental_level_of_education [\"bachelor's degree\" 'some college' \"master's degree\" \"associate's degree\"\n",
      " 'high school' 'some high school']: \n",
      "\n",
      "Categories of lunch ['standard' 'free/reduced']: \n",
      "\n",
      "Categories of test_preparation_course ['none' 'completed']: \n",
      "\n"
     ]
    }
   ],
   "source": [
    "for col in categorical_cols:\n",
    "    print(\"Categories of {} {}: \\n\".format(col,df[col].unique()))    "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4) Exploratory Data Analysis"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>gender</th>\n",
       "      <th>race_ethnicity</th>\n",
       "      <th>parental_level_of_education</th>\n",
       "      <th>lunch</th>\n",
       "      <th>test_preparation_course</th>\n",
       "      <th>math_score</th>\n",
       "      <th>reading_score</th>\n",
       "      <th>writing_score</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>female</td>\n",
       "      <td>group B</td>\n",
       "      <td>bachelor's degree</td>\n",
       "      <td>standard</td>\n",
       "      <td>none</td>\n",
       "      <td>72</td>\n",
       "      <td>72</td>\n",
       "      <td>74</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>female</td>\n",
       "      <td>group C</td>\n",
       "      <td>some college</td>\n",
       "      <td>standard</td>\n",
       "      <td>completed</td>\n",
       "      <td>69</td>\n",
       "      <td>90</td>\n",
       "      <td>88</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>female</td>\n",
       "      <td>group B</td>\n",
       "      <td>master's degree</td>\n",
       "      <td>standard</td>\n",
       "      <td>none</td>\n",
       "      <td>90</td>\n",
       "      <td>95</td>\n",
       "      <td>93</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>male</td>\n",
       "      <td>group A</td>\n",
       "      <td>associate's degree</td>\n",
       "      <td>free/reduced</td>\n",
       "      <td>none</td>\n",
       "      <td>47</td>\n",
       "      <td>57</td>\n",
       "      <td>44</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>male</td>\n",
       "      <td>group C</td>\n",
       "      <td>some college</td>\n",
       "      <td>standard</td>\n",
       "      <td>none</td>\n",
       "      <td>76</td>\n",
       "      <td>78</td>\n",
       "      <td>75</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   gender race_ethnicity parental_level_of_education         lunch  \\\n",
       "0  female        group B           bachelor's degree      standard   \n",
       "1  female        group C                some college      standard   \n",
       "2  female        group B             master's degree      standard   \n",
       "3    male        group A          associate's degree  free/reduced   \n",
       "4    male        group C                some college      standard   \n",
       "\n",
       "  test_preparation_course  math_score  reading_score  writing_score  \n",
       "0                    none          72             72             74  \n",
       "1               completed          69             90             88  \n",
       "2                    none          90             95             93  \n",
       "3                    none          47             57             44  \n",
       "4                    none          76             78             75  "
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Number of students with full marks in Maths: 7\n",
      "Number of students with full marks in Writing: 14\n",
      "Number of students with full marks in Reading: 17\n"
     ]
    }
   ],
   "source": [
    "reading_full = df[df['reading_score'] == 100].shape[0]\n",
    "writing_full = df[df['writing_score'] == 100].shape[0]\n",
    "math_full = df[df['math_score'] == 100].shape[0]\n",
    "\n",
    "\n",
    "print(f\"Number of students with full marks in Maths: {math_full}\")\n",
    "print(f\"Number of students with full marks in Writing: {writing_full}\")\n",
    "print(f\"Number of students with full marks in Reading: {reading_full}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Number of students with marks less than 30 in Maths: 16\n",
      "Number of students with marks less than 30 in Writing: 10\n",
      "Number of students with marks less than 30 in Reading: 8\n"
     ]
    }
   ],
   "source": [
    "reading_less_30 = df[df['reading_score'] <= 30].shape[0]\n",
    "writing_less_30 = df[df['writing_score'] <= 30].shape[0]\n",
    "math_less_30 = df[df['math_score'] <= 30].shape[0]\n",
    "\n",
    "\n",
    "print(f\"Number of students with marks less than 30 in Maths: {math_less_30}\")\n",
    "print(f\"Number of students with marks less than 30 in Writing: {writing_less_30}\")\n",
    "print(f\"Number of students with marks less than 30 in Reading: {reading_less_30}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### observation\n",
    "- student have performed well in reading.\n",
    "- students have performed worst in maths.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.9"
  },
  "orig_nbformat": 4
 },
 "nbformat": 4,
 "nbformat_minor": 2
}