|
--- |
|
license: apache-2.0 |
|
language: |
|
- en |
|
base_model: |
|
- PMDEVS/explorers_emit_model |
|
pipeline_tag: tabular-classification |
|
--- |
|
|
|
|
|
## EMIT Model - Environmental Monitoring and Intelligence Tool |
|
|
|
### Title |
|
**EMIT Model** - Environmental Monitoring and Intelligence Tool (CatBoost Classifier) |
|
|
|
--- |
|
|
|
### Overview |
|
The **EMIT Model** (Environmental Monitoring and Intelligence Tool) is an advanced **CatBoost Classifier** designed to predict potential mining areas by analyzing environmental data. This tool is a part of the **EMiTAL** (Environmental Monitoring and Intelligence Tool Algorithm) framework and leverages **Remote Sensing**, **RayCasting**, and **Polygon Gridding** techniques to provide high-precision identification of viable mining zones. |
|
|
|
#### Goal |
|
To support decision-making in mining by providing a robust predictive model that identifies areas with high mining potential based on environmental characteristics. This model benefits regulatory bodies, mining companies, and environmental agencies aiming to balance resource extraction with sustainability. |
|
|
|
--- |
|
|
|
### Framework: EMiTAL |
|
The **EMiTAL framework** integrates several innovative approaches to enhance prediction accuracy: |
|
- **Remote Sensing**: Captures large-scale environmental data (e.g., vegetation, soil, and air quality). |
|
- **RayCasting and Polygon Gridding**: Segments geographic regions into grids, enabling precise targeting. |
|
- **Environmental Indicators**: |
|
- **NDVI (Normalized Difference Vegetation Index)**: Measures vegetation health. |
|
- **NDWI (Normalized Difference Water Index)**: Evaluates water content. |
|
- **NDTI (Normalized Difference Tillage Index)**: Assesses soil disturbance. |
|
- **Land Elevation**: Provides terrain insights. |
|
- **Air Quality Metrics**: NO2, PM10, and CO to gauge environmental impact. |
|
|
|
--- |
|
|
|
### Model Pipeline |
|
The model pipeline is built to preprocess and optimize environmental data for classification. Using CatBoost’s native handling of categorical data, the pipeline minimizes preprocessing complexity while ensuring high performance. |
|
|
|
- **Model Type**: CatBoost Classifier |
|
- **Objective**: Binary classification to predict if a region is suitable for mining (`True` for viable, `False` for non-viable). |
|
- **Cross-Validation Results**: |
|
- Mean Accuracy: **78.32%** |
|
- Standard Deviation: **4.25%** |
|
- **Final Accuracy on Test Data**: **90.32%** |
|
|
|
--- |
|
|
|
### Dataset and Features |
|
#### Input Features: |
|
- **Latitude** and **Longitude**: Geospatial coordinates. |
|
- **NDVI, NDWI, NDTI**: Environmental indices critical for mining predictions. |
|
- **Land Elevation**: Topographic information. |
|
- **Vegetation Index**: Encoded categories (Null, Sparse, Moderate, Healthy). |
|
- **Air Quality Metrics**: NO2, PM10, and CO levels. |
|
|
|
#### Initial Dataset: |
|
- **Total Records**: 152 |
|
- **Data Types**: Numerical, categorical, and boolean. |
|
- **Categorical Features**: Vegetation Index, handled natively by CatBoost. |
|
|
|
--- |
|
|
|
### Model Performance |
|
#### Key Metrics: |
|
- **Accuracy**: **90.32%** |
|
- **Precision, Recall, F1-Score**: |
|
| **Class** | **Precision** | **Recall** | **F1-Score** | **Support** | |
|
|------------|---------------|------------|--------------|-------------| |
|
| **False** | 0.86 | 0.75 | 0.80 | 8 | |
|
| **True** | 0.92 | 0.96 | 0.94 | 23 | |
|
|
|
- **Overall Accuracy**: **90%** |
|
- **Macro Average**: Precision = 0.89, Recall = 0.85, F1-Score = 0.87 |
|
- **Weighted Average**: Precision = 0.90, Recall = 0.90, F1-Score = 0.90 |
|
|
|
#### Confusion Matrix: |
|
| | Predicted False | Predicted True | |
|
|---------------|-----------------|----------------| |
|
| **Actual False** | 6 | 2 | |
|
| **Actual True** | 1 | 22 | |
|
|
|
--- |
|
|
|
### Feature Importance |
|
The model identified the following features as most influential: |
|
| **Feature** | **Importance (%)** | |
|
|-------------------------------|--------------------| |
|
| Longitude | 40.50 | |
|
| NO2 | 25.81 | |
|
| Latitude | 19.43 | |
|
| NDWI | 4.85 | |
|
| NDVI | 4.60 | |
|
| NDTI | 4.41 | |
|
| Vegetation Index (Encoded) | 0.30 | |
|
| Land Elevation | 0.10 | |
|
| PM10 | 0.00 | |
|
| CO | 0.00 | |
|
|
|
--- |
|
|
|
### Usage Instructions |
|
To use this model: |
|
1. Prepare your dataset with the specified input features. |
|
2. Ensure feature names match the training dataset. |
|
3. Run predictions using the following script: |
|
|
|
```python |
|
import joblib |
|
import pandas as pd |
|
|
|
# Load the model |
|
model = joblib.load("emit_model_catboost.joblib") |
|
|
|
# Load and preprocess your data |
|
data = pd.read_csv("path/to/your/data.csv") |
|
predictions = model.predict(data) |
|
``` |
|
|
|
--- |
|
|
|
### Authors |
|
- Joseph Ackon |
|
- Felix Kudjo Mlagada |
|
- Aristotle Mbroh |
|
- Prince Mawuko Dzorkpe |
|
- Manford Ehuntem |
|
|
|
**Acknowledgments**: |
|
Thanks to **Takoradi Technical University**, **Data Hackathon Ghana Statistical Service (2024)**, and **StatsBank** for their support. |
|
|
|
--- |
|
|
|
This version of the EMIT model is optimized with CatBoost for better performance on mixed-type datasets. Let me know if further updates are needed! |