Tabular Classification
Scikit-learn
English
regression
classification
salary-prediction
stack-overflow
gradient-boosting
random-forest
logistic-regression
clustering
feature-engineering
tabular
Instructions to use rotemvahava/stackoverflow-salary-predictor with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Scikit-learn
How to use rotemvahava/stackoverflow-salary-predictor with Scikit-learn:
from huggingface_hub import hf_hub_download import joblib model = joblib.load( hf_hub_download("rotemvahava/stackoverflow-salary-predictor", "sklearn_model.joblib") ) # only load pickle files from sources you trust # read more about it here https://skops.readthedocs.io/en/stable/persistence.html - Notebooks
- Google Colab
- Kaggle
Update README.md
Browse files
README.md
CHANGED
|
@@ -40,7 +40,7 @@ pipeline_tag: tabular-classification
|
|
| 40 |
|
| 41 |
## Project Overview
|
| 42 |
|
| 43 |
-
This project builds a complete end-to-end machine learning pipeline that predicts developer compensation using the Stack Overflow Developer Survey 2023
|
| 44 |
|
| 45 |
The same dataset is used for two prediction tasks:
|
| 46 |
- **Regression** — predicting the exact annual salary in USD.
|
|
@@ -337,7 +337,9 @@ print(f"Class probabilities: {dict(zip(clf_model.classes_, predicted_proba[0]))}
|
|
| 337 |
## Dataset
|
| 338 |
|
| 339 |
**Source:** [Stack Overflow Developer Survey 2023](https://www.kaggle.com/datasets/stackoverflow/stack-overflow-2023-developers-survey) on Kaggle
|
| 340 |
-
**Original size:** ~89,000 respondents,
|
|
|
|
|
|
|
| 341 |
**After cleaning:** 45,804 developers with valid salary data
|
| 342 |
|
| 343 |
---
|
|
|
|
| 40 |
|
| 41 |
## Project Overview
|
| 42 |
|
| 43 |
+
This project builds a complete end-to-end machine learning pipeline that predicts developer compensation using the Stack Overflow Developer Survey 2023 — a dataset of ~89,000 developers worldwide with 84 raw features. From those, I selected 16 features most relevant to salary prediction and ended up with 45,804 developers after cleaning.
|
| 44 |
|
| 45 |
The same dataset is used for two prediction tasks:
|
| 46 |
- **Regression** — predicting the exact annual salary in USD.
|
|
|
|
| 337 |
## Dataset
|
| 338 |
|
| 339 |
**Source:** [Stack Overflow Developer Survey 2023](https://www.kaggle.com/datasets/stackoverflow/stack-overflow-2023-developers-survey) on Kaggle
|
| 340 |
+
**Original size:** ~89,000 respondents, 84 features in the raw dataset
|
| 341 |
+
**Selected for analysis:** 16 features chosen for relevance to salary prediction
|
| 342 |
+
**After cleaning:** 45,804 developers with valid salary data ($5K–$500K range)
|
| 343 |
**After cleaning:** 45,804 developers with valid salary data
|
| 344 |
|
| 345 |
---
|