--- license: apache-2.0 metrics: - accuracy pipeline_tag: tabular-classification library_name: ml-agents --- # EMIT Model - Environmental Monitoring and Intelligence Tool ## Title **EMIT Model** - Environmental Monitoring and Intelligence Tool ## Overview The **EMIT Model** (Environmental Monitoring and Intelligence Tool) is an advanced XGBoost classifier specifically designed to predict potential mining areas by analyzing environmental data. Developed as part of the **EMiTAL** (Environmental Monitoring and Intelligence Tool Algorithm) framework, this tool combines cutting-edge **Remote Sensing** techniques with **RayCasting** and **Polygon Gridding** to achieve high-precision identification of viable mining zones. ### Goal The goal of this model is to support decision-making in mining by providing a predictive tool that identifies areas with high mining potential based on environmental characteristics. This application is highly valuable for regulatory bodies, mining companies, and environmental agencies looking to balance resource extraction with ecological sustainability. ## Framework: EMiTAL The **EMiTAL framework** serves as the foundation for this model. It includes a unique combination of algorithms and data collection methods: - **Remote Sensing**: Utilized to gather extensive environmental data on a regional scale, including soil, vegetation, and atmospheric readings. - **RayCasting and Polygon Gridding (RGP)**: This technique segments geographic regions into grids, allowing precise localization of areas under study. - **Environmental Indicators**: The model leverages data such as: - **NDVI (Normalized Difference Vegetation Index)**: Captures vegetation health. - **NDWI (Normalized Difference Water Index)**: Measures surface moisture and water presence. - **NDTI (Normalized Difference Tillage Index)**: Identifies soil disturbance and human activity impact. - **Air Quality Metrics**: NO2, PM10, and CO concentrations to assess environmental impact factors. ### Model Pipeline The pipeline includes preprocessing and feature engineering stages to optimize environmental data for classification. The EMiTAL framework ensures precise location detection using the RGP algorithm, and remote sensing data covers all latitudes and longitudes under consideration. ## Model Specifications - **Model Type**: XGBoost Classifier - **Objective**: Binary classification to determine if a region is suitable for mining (`True` for viable, `False` for non-viable). - **Training Metrics**: - **Accuracy Score**: Measures the proportion of correct predictions. - **Mean Absolute Error (MAE)**: Average error in prediction. - **R-Squared**: Assesses how well features explain label variability. - **High Confidence Accuracy**: Accuracy for predictions with a confidence level above 90%. - **Data Split**: The dataset was divided as follows: - **Training**: 70% - **Validation**: 20% - **Testing**: 10% ## Input Data Features - **Latitude** and **Longitude** coordinates for precise geolocation. - **Vegetation Index**: Indicates vegetation density and type. - **NDVI, NDWI, NDTI**: Capture soil and surface characteristics crucial for identifying mining areas. - **Land Elevation**: Provides terrain insights. - **NO2, PM10, CO**: Environmental pollution metrics critical for impact analysis. ## Usage Instructions To use this model, prepare your dataset with the required environmental features. Ensure the feature names match those in the training dataset for optimal results. Predictions can be generated using the pre-trained model with the following script: ```python import joblib import pandas as pd # Load the model model = joblib.load("emit_model.joblib") # Load and preprocess your data data = pd.read_csv("path/to/your/data.csv") predictions = model.predict(data) ``` ## Model Performance Metrics The EMIT Model's performance was evaluated using various metrics, giving insights into its accuracy, error rates, and ability to generalize predictions effectively. Below are the primary metrics observed during testing: - **Accuracy**: 0.8125 The accuracy score represents the proportion of correct predictions out of the total predictions, with the model correctly predicting the target class 81.25% of the time. - **Mean Absolute Error (MAE)**: 0.1875 MAE indicates the average difference between predicted and actual values, with an error rate of approximately 18.75%. - **R-Squared**: 0.238 The R-Squared metric, or coefficient of determination, suggests that around 23.8% of the variance in the target variable can be explained by the model’s features, providing some insight into feature influence on predictions. ### Classification Report The classification report provides a detailed look at the precision, recall, and F1-score across both classes (True and False): | Class | Precision | Recall | F1-score | Support | |------------|-----------|--------|----------|---------| | **False** | 1.00 | 0.57 | 0.73 | 7 | | **True** | 0.75 | 1.00 | 0.86 | 9 | - **Precision** measures the accuracy of the positive predictions. - **Recall** represents the proportion of actual positives that were correctly identified by the model. - **F1-Score** is the harmonic mean of precision and recall, providing a balanced view of accuracy for each class. **Overall Classification Report**: - **Accuracy**: 0.81 (or 81%) - **Macro Average**: Precision = 0.88, Recall = 0.79, F1-score = 0.79 - **Weighted Average**: Precision = 0.86, Recall = 0.81, F1-score = 0.80 ### Confusion Matrix The confusion matrix below shows the true positive, true negative, false positive, and false negative counts: | | Predicted False | Predicted True | |---------------|-----------------|----------------| | **Actual False** | 4 | 3 | | **Actual True** | 0 | 9 | This matrix indicates: - **True Positives (TP)**: 9 - **True Negatives (TN)**: 4 - **False Positives (FP)**: 3 - **False Negatives (FN)**: 0 The model demonstrates a high recall for the `True` class, correctly identifying all actual `True` instances (recall of 1.00) but has lower precision for the `False` class, with some misclassification present. ## Authors and Team Developed by **Team Explorers**, a collaborative effort aimed at advancing predictive environmental analysis: 1. Joseph Ackon 2. Felix Kudjo Mlagada 3. Aristotle Mbroh 4. Prince Mawuko Dzorkpe 5. Manford Ehuntem ### Acknowledgments Our work would not have been possible without the support and resources provided by: 1. **Takoradi Technical University** 2. **Data Hackathon Ghana Statistical Service, 2024** The dataset was created using the **EMiTAL architecture**, complemented by insights from **StatsBank** and **Common Data Resources**. ## Model Repository and Future Development This model is hosted on Hugging Face, enabling others to leverage it for environmental mining predictions. Future updates may include refined data collection techniques and enhancements to incorporate additional environmental variables. --- ### How to Contribute If you would like to contribute to this project, please reach out to the team for collaboration opportunities. We welcome insights, data contributions, and suggestions for model improvements. ``` This expanded version incorporates all of the specific details you requested, providing a comprehensive model card that would be informative and helpful for users on Hugging Face. Let me know if you need further customization!