aai540-group3/diabetes-readmission
Model Description
This model predicts 30-day hospital readmissions for diabetic patients using historical patient data and machine learning techniques. The model aims to identify high-risk individuals enabling targeted interventions and improved healthcare resource allocation.
Overview
- Task: Binary Classification (Hospital Readmission Prediction)
- Model Type: autogluon
- Framework: Python Autogluon
- License: MIT
- Last Updated: 2024-10-29
Performance Metrics
- Test Accuracy: 0.8865
- Test ROC-AUC: 0.6467
Feature Importance
Significant features and their importance scores:
Feature | Importance | p-value | 99% CI |
---|---|---|---|
0 | 0.0563 | 3.24e-04 | [0.0294, 0.0832] |
1 | 0.0358 | 8.45e-06 | [0.0290, 0.0426] |
2 | 0.0080 | 0.0083 | [-0.0013, 0.0173] |
3 | 0.0046 | 1.96e-04 | [0.0027, 0.0065] |
4 | 0.0023 | 0.0055 | [-0.0001, 0.0046] |
5 | 0.0008 | 0.1840 | [-0.0027, 0.0043] |
Note: Only features with non-zero importance are shown. The confidence intervals (CI) are calculated at the 99% level. Features with p-value < 0.05 are considered statistically significant.
Features
Numeric Features
- Patient demographics (age)
- Hospital stay metrics (time_in_hospital, num_procedures, num_lab_procedures)
- Medication metrics (num_medications, total_medications)
- Service utilization (number_outpatient, number_emergency, number_inpatient)
- Diagnostic information (number_diagnoses)
Binary Features
- Patient characteristics (gender)
- Medication flags (diabetesmed, change, insulin_with_oral)
Interaction Features
- Time-based interactions (medications ร time, procedures ร time)
- Complexity indicators (age ร diagnoses, medications ร procedures)
- Resource utilization (lab procedures ร time, medications ร changes)
Ratio Features
- Resource efficiency (procedure/medication ratio, lab/procedure ratio)
- Diagnostic density (diagnosis/procedure ratio)
Intended Use
This model is designed for healthcare professionals to assess the risk of 30-day readmission for diabetic patients. It should be used as a supportive tool in conjunction with clinical judgment.
Primary Intended Uses
- Predict likelihood of 30-day hospital readmission
- Support resource allocation and intervention planning
- Aid in identifying high-risk patients
- Assist in care management decision-making
Out-of-Scope Uses
- Non-diabetic patient populations
- Predicting readmissions beyond 30 days
- Making final decisions without clinical oversight
- Use as sole determinant for patient care decisions
- Emergency or critical care decision-making
Training Data
The model was trained on the Diabetes 130-US Hospitals Dataset (1999-2008) from UCI ML Repository. This dataset includes:
- Over 100,000 hospital admissions
- 50+ features including patient demographics, diagnoses, procedures
- Binary outcome: readmission within 30 days
- Comprehensive medication tracking
- Detailed hospital utilization metrics
Training Procedure
Data Preprocessing
- Missing value imputation using mean/mode
- Outlier handling using 5-sigma clipping
- Feature scaling using StandardScaler
- Categorical encoding using one-hot encoding
- Log transformation for skewed features
Feature Engineering
- Created interaction terms between key variables
- Generated resource utilization ratios
- Aggregated medication usage metrics
- Developed time-based interaction features
- Constructed diagnostic density metrics
Model Training
- Data split: 70% training, 15% validation, 15% test
- Cross-validation for model selection
- Hyperparameter optimization via grid search
- Early stopping to prevent overfitting
- Model selection based on ROC-AUC performance
Limitations & Biases
Known Limitations
- Model performance depends on data quality and completeness
- Limited to the scope of training data timeframe (1999-2008)
- May not generalize to significantly different healthcare systems
- Requires standardized input data format
Potential Biases
- May exhibit demographic biases present in training data
- Performance may vary across different hospital systems
- Could be influenced by regional healthcare practices
- Might show temporal biases due to historical data
Recommendations
- Regular model monitoring and retraining
- Careful validation in new deployment contexts
- Assessment of performance across demographic groups
- Integration with existing clinical workflows
Monitoring & Maintenance
Monitoring Requirements
- Track prediction accuracy across different patient groups
- Monitor input data distribution shifts
- Assess feature importance stability
- Evaluate performance metrics over time
Maintenance Schedule
- Quarterly performance reviews recommended
- Annual retraining with updated data
- Regular bias assessments
- Ongoing validation against current practices
Citation
@misc{diabetes-readmission-model,
title = {Hospital Readmission Prediction Model for Diabetic Patients},
author = {Agustin, Jonathan and Robertson, Zack and Vo, Lisa},
year = {2024},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/{REPO_ID}}}
}
@misc{diabetes-dataset,
title = {Diabetes 130-US Hospitals for Years 1999-2008 Data Set},
author = {Strack, B. and DeShazo, J. and Gennings, C. and Olmo, J. and
Ventura, S. and Cios, K. and Clore, J.},
year = {2014},
publisher = {UCI Machine Learning Repository},
doi = {10.24432/C5230J}
}
Model Card Authors
Jonathan Agustin, Zack Robertson, Lisa Vo
For Questions, Issues, or Feedback
- GitHub Issues: Repository Issues
- Email: [team contact information]
Updates and Versions
- {pd.Timestamp.now().strftime('%Y-%m-%d')}: Initial model release
- Feature engineering pipeline implemented
- Comprehensive preprocessing system added
- Model evaluation and selection completed
Last updated: {pd.Timestamp.now().strftime('%Y-%m-%d')}
- Downloads last month
- 15
Space using aai540-group3/diabetes-readmission 1
Evaluation results
- accuracy on Diabetes 130-US Hospitalsself-reported0.887
- auc on Diabetes 130-US Hospitalsself-reported0.647