Spaces:

jordyvl
/

ask_my_thesis

Paused

App Files Files Community

ask_my_thesis / assets /txts /pg_0062.txt

jordyvl

First commit

e0a78f5 8 months ago

raw

history blame

2.41 kB

	30

	FUNDAMENTALS

	For the sake of completeness, there exist different notions of calibration, differing
	in the subset of predictions considered over ∆Y [463]:
	I. top-1 [156]
	II. top-r [159]
	III. canonical calibration [51]
	Formally, a classifier f is said to be canonically calibrated iff,
	P(Y = yk \| f (X) = ρ) = ρk

	∀k ∈ [K] ∧ ∀ρ ∈ [0, 1]K where K = \|Y\|. (2.17)

	However, the most strict notion of calibration becomes infeasible to compute
	once the output space cardinality exceeds a certain size [157].
	For discrete target spaces with a large number of classes, there is plenty interest
	in knowing that a model is calibrated on less likely predictions as well. Some
	relaxed notions of calibration have been proposed, which are more feasible
	to compute and can be used to compare models on a more equal footing.
	These include: top-label [157], top-r [159], within-top-r [159], marginal
	[229, 231, 342, 492].

	2.2.5

	Predictive Uncertainty Quantification

	Bayes’ theorem [26] is a fundamental result in probability theory, which
	provides a principled way to update beliefs about an event given new evidence.
	Bayesian Deep Learning (BDL) methods build on these solid mathematical
	foundations and promise reliable predictive uncertainty quantification (PUQ)
	[124, 136, 140, 238, 301, 325, 326, 464, 466, 496].
	The Bayesian approach consists of casting learning and prediction as an
	inference task about hypotheses (uncertain quantities, with θ representing
	all BNN parameters: weights w, biases b, and model structure) from training
	N
	data (measurable quantities, D = {(xi , yi )}i=1 = (X, Y )).
	Bayesian Neural Networks (BNN) are in theory able to avoid the pitfalls
	of stochastic non-convex optimization on non-linear tunable functions with
	many high-dimensional parameters [300]. More specifically, BNNs can capture
	the uncertainty in the NN parameters by learning a distribution over them,
	rather than a single point estimate. This offers advantages in terms of data
	efficiency, avoiding overfitting thanks to regularization from parameter priors,
	model complexity control, and robustness to noise due to the probabilistic
	nature. However, they come with their own challenges such as the increased
	computational cost of learning and inference, the difficulty of specifying
	appropriate weight or function priors, and the need for specialized training
	algorithms or architectural extensions.