Spaces:

jordyvl
/

ask_my_thesis

Paused

App Files Files Community

ask_my_thesis / assets /txts /pg_0061.txt

jordyvl

First commit

e0a78f5 10 months ago

raw

history blame

2.35 kB

	RELIABILITY AND ROBUSTNESS

	29

	• Theoretical frameworks to generalize over existing metrics and design
	novel metrics [43, 231, 492, 493]
	• Specialize towards a task such as multi-class classification [463], regression
	[228, 428], or structured prediction [227]
	• Alternative error estimation procedures, based on histogram regression
	[156, 331, 332, 340, 343], kernels [230, 370, 492, 493] or splines [159]
	(B) Calibration methods for improving the reliability of a model by adapting
	the CSF or inducing calibration during training of f :
	• Learn a post-hoc forecaster F : f (X) → [0, 1] on top of f (overview: [298])
	• Modify the training procedure with regularization (overview: [277, 370])
	Due to its importance in practice, we will provide more detail on train-time
	calibration methods. It has been shown for a broad class of loss functions
	that risk minimization leads to Fisher consistent, Bayes optimal classifiers in
	the asymptotic limit [25, 495]. These can be shown to decompose into a sum
	of multiple metrics including both accuracy and calibration error [144, 177].
	However, there is no –finite data, nor asymptotic– guarantee that classifiers
	trained with proper loss functions containing an explicit calibration term
	will eventually be well-calibrated. In practice, being entangled with other
	optimization terms often leads to sub-optimal calibration. For this reason,
	recent studies [12, 230, 492] have derived trainable estimators of calibration
	to have a better handle (γ > 0) on penalizing miscalibration, i.e., by jointly
	optimizing risk (R(f ) = EX,Y [` (Y, f (X))]) and parameterized calibration error
	(CE) as in Equation (2.16).
	fˆ = arg min (R(f ) + γ CE(f ))
	f ∈F

	(2.16)

	Many of these methods are implicitly or explicitly maximizing entropy of
	predictions or entropy relative to another probability distribution, e.g., Entropy
	Regularization [361], Label Smoothing (LS) [327], Focal Loss [324], Marginbased LS [277], next to more direct (differentiable), kernel-based calibration
	error estimation [211, 230, 370, 492, 493, 526]. We had expected community
	contribution on the DUDE competition (Chapter 5) to take advantage of this
	wealth of calibration methods, yet the majority of submissions used uncalibrated
	models with MSP, requiring more education on the importance of calibration
	in practice.