rotemvahava commited on
Commit
d05c0f7
·
verified ·
1 Parent(s): 1a25ef7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -2
README.md CHANGED
@@ -40,7 +40,7 @@ pipeline_tag: tabular-classification
40
 
41
  ## Project Overview
42
 
43
- This project builds a complete end-to-end machine learning pipeline that predicts developer compensation using the Stack Overflow Developer Survey 2023 (~89,000 developers worldwide). The pipeline progresses from raw data through exploratory analysis, feature engineering, unsupervised clustering, regression modeling, and multi-class classification ending with two production-ready Gradient Boosting models exported to this repository.
44
 
45
  The same dataset is used for two prediction tasks:
46
  - **Regression** — predicting the exact annual salary in USD.
@@ -337,7 +337,9 @@ print(f"Class probabilities: {dict(zip(clf_model.classes_, predicted_proba[0]))}
337
  ## Dataset
338
 
339
  **Source:** [Stack Overflow Developer Survey 2023](https://www.kaggle.com/datasets/stackoverflow/stack-overflow-2023-developers-survey) on Kaggle
340
- **Original size:** ~89,000 respondents, 16 features used
 
 
341
  **After cleaning:** 45,804 developers with valid salary data
342
 
343
  ---
 
40
 
41
  ## Project Overview
42
 
43
+ This project builds a complete end-to-end machine learning pipeline that predicts developer compensation using the Stack Overflow Developer Survey 2023 — a dataset of ~89,000 developers worldwide with 84 raw features. From those, I selected 16 features most relevant to salary prediction and ended up with 45,804 developers after cleaning.
44
 
45
  The same dataset is used for two prediction tasks:
46
  - **Regression** — predicting the exact annual salary in USD.
 
337
  ## Dataset
338
 
339
  **Source:** [Stack Overflow Developer Survey 2023](https://www.kaggle.com/datasets/stackoverflow/stack-overflow-2023-developers-survey) on Kaggle
340
+ **Original size:** ~89,000 respondents, 84 features in the raw dataset
341
+ **Selected for analysis:** 16 features chosen for relevance to salary prediction
342
+ **After cleaning:** 45,804 developers with valid salary data ($5K–$500K range)
343
  **After cleaning:** 45,804 developers with valid salary data
344
 
345
  ---