Spaces:

jordyvl
/

ask_my_thesis

Paused

App Files Files Community

ask_my_thesis / assets /txts /pg_0064.txt

jordyvl

First commit

e0a78f5 8 months ago

raw

history blame

2.37 kB

	32

	FUNDAMENTALS

	fit the model to the data well and ensure that the approximate posterior is
	encouraged to be as close as possible to the true posterior distribution.
	Even a non-Bayesian, classic NN can be interpreted in this framework as an
	approximate, degenerate posterior distribution, i.e., a Dirac delta function
	centered on the MAP estimate of the parameters, q(θ\|D; φ) = δ(θ − θ̂MAP ).
	More PUQ methods based on different posterior approximations are discussed
	in detail in Chapter 3, with additional updates on the state-of-the-art.

	2.2.6

	Failure Prediction

	Based on the principle of selective prediction [138, 139], failure prediction is
	the task of predicting whether a model will fail on a given input. In every chapter
	following Chapter 3, this topic is addressed in the context of the respective
	task. Since it is an important topic in the context of IA-DU that is generating
	increasing interest [81, 114, 127, 193, 391], it warrants a brief overview of
	how it provides a unified perspective. We refer the reader to [171, 536] for a
	comprehensive survey.
	Failure prediction subsumes many related tasks in the sense that it requires
	a failure source to be defined to form a binary classification task. The failure
	source can be i.i.d. mispredictions, covariate shifts (e.g., input corruptions,
	concept drift, domain shift), a new class, domain, modality, task, or concept.
	The goal of failure prediction is to predict these failures before they occur,
	allowing for more reliable and robust ML systems.
	First, note that calibration does not imply failure prediction, as a calibrated
	model w.r.t. i.i.d. data can still be overconfident on OOD inputs [549]. The
	example in Example 2.2.1 sketches the independent requirements of calibration
	and confidence ranking.
	Example 2.2.1. Classifier A scores 90% accuracy on the test set, with a CSF
	using the entire range [0, 1]. Classifier B scores 92% accuracy on the test set,
	but the CSF always reports 0.92 for any input. Which classifier is preferred in
	a real-world setting?
	• Classifier A is calibrated, but it is not possible to know whether it will
	fail on a given input.
	• Classifier B might be less calibrated, but the CSF allows separability to
	predict failure on a given input.
	Specific to OOD failure prediction, [527] provides a comprehensive categorization
	of failure tasks and methods.