32 FUNDAMENTALS fit the model to the data well and ensure that the approximate posterior is encouraged to be as close as possible to the true posterior distribution. Even a non-Bayesian, classic NN can be interpreted in this framework as an approximate, degenerate posterior distribution, i.e., a Dirac delta function centered on the MAP estimate of the parameters, q(θ|D; φ) = δ(θ − θ̂MAP ). More PUQ methods based on different posterior approximations are discussed in detail in Chapter 3, with additional updates on the state-of-the-art. 2.2.6 Failure Prediction Based on the principle of selective prediction [138, 139], failure prediction is the task of predicting whether a model will fail on a given input. In every chapter following Chapter 3, this topic is addressed in the context of the respective task. Since it is an important topic in the context of IA-DU that is generating increasing interest [81, 114, 127, 193, 391], it warrants a brief overview of how it provides a unified perspective. We refer the reader to [171, 536] for a comprehensive survey. Failure prediction subsumes many related tasks in the sense that it requires a failure source to be defined to form a binary classification task. The failure source can be i.i.d. mispredictions, covariate shifts (e.g., input corruptions, concept drift, domain shift), a new class, domain, modality, task, or concept. The goal of failure prediction is to predict these failures before they occur, allowing for more reliable and robust ML systems. First, note that calibration does not imply failure prediction, as a calibrated model w.r.t. i.i.d. data can still be overconfident on OOD inputs [549]. The example in Example 2.2.1 sketches the independent requirements of calibration and confidence ranking. Example 2.2.1. Classifier A scores 90% accuracy on the test set, with a CSF using the entire range [0, 1]. Classifier B scores 92% accuracy on the test set, but the CSF always reports 0.92 for any input. Which classifier is preferred in a real-world setting? • Classifier A is calibrated, but it is not possible to know whether it will fail on a given input. • Classifier B might be less calibrated, but the CSF allows separability to predict failure on a given input. Specific to OOD failure prediction, [527] provides a comprehensive categorization of failure tasks and methods.