Update README.md

a27123d verified 4 days ago

5.33 kB

	---
	license: apache-2.0
	language:
	- en
	base_model:
	- PMDEVS/explorers_emit_model
	pipeline_tag: tabular-classification
	---


	## EMIT Model - Environmental Monitoring and Intelligence Tool

	### Title
	EMIT Model - Environmental Monitoring and Intelligence Tool (CatBoost Classifier)

	---

	### Overview
	The EMIT Model (Environmental Monitoring and Intelligence Tool) is an advanced CatBoost Classifier designed to predict potential mining areas by analyzing environmental data. This tool is a part of the EMiTAL (Environmental Monitoring and Intelligence Tool Algorithm) framework and leverages Remote Sensing, RayCasting, and Polygon Gridding techniques to provide high-precision identification of viable mining zones.

	#### Goal
	To support decision-making in mining by providing a robust predictive model that identifies areas with high mining potential based on environmental characteristics. This model benefits regulatory bodies, mining companies, and environmental agencies aiming to balance resource extraction with sustainability.

	---

	### Framework: EMiTAL
	The EMiTAL framework integrates several innovative approaches to enhance prediction accuracy:
	- Remote Sensing: Captures large-scale environmental data (e.g., vegetation, soil, and air quality).
	- RayCasting and Polygon Gridding: Segments geographic regions into grids, enabling precise targeting.
	- Environmental Indicators:
	- NDVI (Normalized Difference Vegetation Index): Measures vegetation health.
	- NDWI (Normalized Difference Water Index): Evaluates water content.
	- NDTI (Normalized Difference Tillage Index): Assesses soil disturbance.
	- Land Elevation: Provides terrain insights.
	- Air Quality Metrics: NO2, PM10, and CO to gauge environmental impact.

	---

	### Model Pipeline
	The model pipeline is built to preprocess and optimize environmental data for classification. Using CatBoost’s native handling of categorical data, the pipeline minimizes preprocessing complexity while ensuring high performance.

	- Model Type: CatBoost Classifier
	- Objective: Binary classification to predict if a region is suitable for mining (`True` for viable, `False` for non-viable).
	- Cross-Validation Results:
	- Mean Accuracy: 78.32%
	- Standard Deviation: 4.25%
	- Final Accuracy on Test Data: 90.32%

	---

	### Dataset and Features
	#### Input Features:
	- Latitude and Longitude: Geospatial coordinates.
	- NDVI, NDWI, NDTI: Environmental indices critical for mining predictions.
	- Land Elevation: Topographic information.
	- Vegetation Index: Encoded categories (Null, Sparse, Moderate, Healthy).
	- Air Quality Metrics: NO2, PM10, and CO levels.

	#### Initial Dataset:
	- Total Records: 152
	- Data Types: Numerical, categorical, and boolean.
	- Categorical Features: Vegetation Index, handled natively by CatBoost.

	---

	### Model Performance
	#### Key Metrics:
	- Accuracy: 90.32%
	- Precision, Recall, F1-Score:
	\| Class \| Precision \| Recall \| F1-Score \| Support \|
	\|------------\|---------------\|------------\|--------------\|-------------\|
	\| False \| 0.86 \| 0.75 \| 0.80 \| 8 \|
	\| True \| 0.92 \| 0.96 \| 0.94 \| 23 \|

	- Overall Accuracy: 90%
	- Macro Average: Precision = 0.89, Recall = 0.85, F1-Score = 0.87
	- Weighted Average: Precision = 0.90, Recall = 0.90, F1-Score = 0.90

	#### Confusion Matrix:
	\| \| Predicted False \| Predicted True \|
	\|---------------\|-----------------\|----------------\|
	\| Actual False \| 6 \| 2 \|
	\| Actual True \| 1 \| 22 \|

	---

	### Feature Importance
	The model identified the following features as most influential:
	\| Feature \| Importance (%) \|
	\|-------------------------------\|--------------------\|
	\| Longitude \| 40.50 \|
	\| NO2 \| 25.81 \|
	\| Latitude \| 19.43 \|
	\| NDWI \| 4.85 \|
	\| NDVI \| 4.60 \|
	\| NDTI \| 4.41 \|
	\| Vegetation Index (Encoded) \| 0.30 \|
	\| Land Elevation \| 0.10 \|
	\| PM10 \| 0.00 \|
	\| CO \| 0.00 \|

	---

	### Usage Instructions
	To use this model:
	1. Prepare your dataset with the specified input features.
	2. Ensure feature names match the training dataset.
	3. Run predictions using the following script:

	```python
	import joblib
	import pandas as pd

	# Load the model
	model = joblib.load("emit_model_catboost.joblib")

	# Load and preprocess your data
	data = pd.read_csv("path/to/your/data.csv")
	predictions = model.predict(data)
	```

	---

	### Authors
	- Joseph Ackon
	- Felix Kudjo Mlagada
	- Aristotle Mbroh
	- Prince Mawuko Dzorkpe
	- Manford Ehuntem

	Acknowledgments:
	Thanks to Takoradi Technical University, Data Hackathon Ghana Statistical Service (2024), and StatsBank for their support.

	---

	This version of the EMIT model is optimized with CatBoost for better performance on mixed-type datasets. Let me know if further updates are needed!