Spaces:

jordyvl
/

ask_my_thesis

Paused

App Files Files Community

ask_my_thesis / assets /txts /pg_0058.txt

jordyvl

First commit

e0a78f5 8 months ago

raw

history blame

2.21 kB

	26

	FUNDAMENTALS

	the true class. This measure more heavily penalizes sharp probabilities,
	which are close to the wrong edge or class by over/under-confidence.
	`NLL (f ) = −

	N K
	1 XX
	I [yi = k] · log (fk (xi ))
	N i=1

	(2.10)

	k=1

	• Brier Score [50] is a scoring rule that measures the accuracy of a
	probabilistic classifier and is related to the mean-squared error (MSE) loss
	function. Brier score is more commonly used in industrial practice since it
	is an λ2 metric (score between 0 and 1), yet it penalizes tail probabilities
	less severely than NLL.
	`BS (f ) =

	N K
	1 XX
	2
	(I (yi = k) − fk (xi ))
	N i=1

	(2.11)

	k=1

	All metrics following require a CSF g(x) to be defined, and can pertain to
	specific evaluation settings [389] tested in Section 3.4.5.
	Expected Calibration Error (ECE) [156, 332] is a default metric to evaluate
	top-1 prediction miscalibration. A calibration estimator (Definition 7) measures
	the Lp norm difference between a model’s posterior and the true likelihood of
	being correct.
	Definition 7 (Lp Calibration Error). [231, 463]
	The Lp calibration error of f : X → ∆Y over the joint distribution (X × Y )
	with the Lp norm p ∈ [1, ∞) is given by:


	CEp (f )p = E(X,Y ) kE[Y \| f (X)] − f (X)kpp
	(2.12)
	The popular ECE metric [332] with condition I[Y = ŷ] is a special case of the
	above with p = 1, where the expectation is approximated using a histogram.
	MaxCE defines the worst-case risk version with p = ∞, effectively reporting on
	the bin with the highest error. As part of Chapter 5, we contributed a novel
	empirical estimator of top-1 calibration for the task of VQA, where the exact
	accuracy condition I[Y = ŷ] in ECEis replaced by I[ANLS(y, ŷ) > τ ]. Prior
	work [329] used a similar strategy of thresholding continuous quality scores to
	be able to estimate ECE.
	In practice, ECE is implemented as a histogram binning estimator that
	discretizes predicted probabilities into ranges of possible values for which
	conditional expectation can be estimated. Concretely, the probability space
	is partitioned into B bins bi with i ∈ {1, ..., B}, where for each bin bi the gap
	between observed accuracy and bin confidence P¯b is measured, with a final