rotemvahava commited on
Commit
012b116
·
verified ·
1 Parent(s): 62eb75e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -2
README.md CHANGED
@@ -71,7 +71,9 @@ The EDA started with cleaning the data: imputing missing values (median for nume
71
 
72
  The raw salary column had extreme values that would have distorted any model — entries below $5K (probably typos or freelance side gigs) and above $500K (likely C-level executives or data entry errors). I capped the salary range to $5K–$500K, which removed the noise while keeping the meaningful tail of high earners.
73
 
74
- ![Outlier Cleaning](outliers_cleaning.png)
 
 
75
 
76
  After this filtering, I was left with 45,804 developers with reliable salary data.
77
 
@@ -290,7 +292,7 @@ Gradient Boosting wins consistently across all three classes — AUC = 0.909 for
290
 
291
  Precision-Recall curves complement the ROC analysis with another view on model quality, especially useful when looking at the tradeoff between catching actual positives (recall) and being right when predicting positives (precision).
292
 
293
- ![Precision-Recall Curves](precision_recall_curves.png)
294
 
295
  The pattern matches what I saw in ROC: Gradient Boosting wins across all three classes — AP of 0.832 for Low, 0.635 for Mid, and 0.858 for High. Logistic Regression is right behind, and Random Forest comes in last but only by a small margin. The Mid class consistently has the lowest AP (~0.61–0.64) across all models, confirming the same pattern from ROC and the confusion matrices: Mid sits between Low and High without clean boundaries, so it's harder for any model to be both precise and complete about it. Even the worst Mid curve at AP = 0.61 is nearly twice the no-skill baseline of 0.33, confirming the models add real predictive value.
296
 
 
71
 
72
  The raw salary column had extreme values that would have distorted any model — entries below $5K (probably typos or freelance side gigs) and above $500K (likely C-level executives or data entry errors). I capped the salary range to $5K–$500K, which removed the noise while keeping the meaningful tail of high earners.
73
 
74
+
75
+ ![image](https://cdn-uploads.huggingface.co/production/uploads/69d8c4ebac66ae8128c346d8/kbq48nMVx4QsTC4ppcNIj.png)
76
+
77
 
78
  After this filtering, I was left with 45,804 developers with reliable salary data.
79
 
 
292
 
293
  Precision-Recall curves complement the ROC analysis with another view on model quality, especially useful when looking at the tradeoff between catching actual positives (recall) and being right when predicting positives (precision).
294
 
295
+ ![image](https://cdn-uploads.huggingface.co/production/uploads/69d8c4ebac66ae8128c346d8/JZ8pzJDRpAPeoU9QyCsnX.png)
296
 
297
  The pattern matches what I saw in ROC: Gradient Boosting wins across all three classes — AP of 0.832 for Low, 0.635 for Mid, and 0.858 for High. Logistic Regression is right behind, and Random Forest comes in last but only by a small margin. The Mid class consistently has the lowest AP (~0.61–0.64) across all models, confirming the same pattern from ROC and the confusion matrices: Mid sits between Low and High without clean boundaries, so it's harder for any model to be both precise and complete about it. Even the worst Mid curve at AP = 0.61 is nearly twice the no-skill baseline of 0.33, confirming the models add real predictive value.
298