arxiv:2401.08348

We don't need no labels: Estimating post-deployment model performance under covariate shift without ground truth

Published on Jan 16

Authors:

Abstract

The performance of machine learning models often degrades after deployment due to data distribution shifts. In many use cases, it is impossible to calculate the post-deployment performance because labels are unavailable or significantly delayed. Proxy methods for evaluating model performance stability, like drift detection techniques, do not properly quantify data distribution shift impact. As a solution, we propose a robust and accurate performance estimation method for evaluating ML classification models on unlabeled data that accurately quantifies the impact of covariate shift on model performance. We call it multi-calibrated confidence-based performance estimation (M-CBPE). It is model and data-type agnostic and works for any performance metric. It does not require access to the monitored model - it uses the model predictions and probability estimates. M-CBPE does not need user input on the nature of the covariate shift as it fully learns from the data. We evaluate it with over 600 dataset-model pairs from US census data and compare it with multiple benchmarks using several evaluation metrics. Results show that M-CBPE is the best method to estimate the performance of classification models in any evaluation context.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2401.08348 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2401.08348 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2401.08348 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.