Spaces:
Paused
Paused
RELIABILITY AND ROBUSTNESS | |
29 | |
• Theoretical frameworks to generalize over existing metrics and design | |
novel metrics [43, 231, 492, 493] | |
• Specialize towards a task such as multi-class classification [463], regression | |
[228, 428], or structured prediction [227] | |
• Alternative error estimation procedures, based on histogram regression | |
[156, 331, 332, 340, 343], kernels [230, 370, 492, 493] or splines [159] | |
(B) Calibration methods for improving the reliability of a model by adapting | |
the CSF or inducing calibration during training of f : | |
• Learn a post-hoc forecaster F : f (X) → [0, 1] on top of f (overview: [298]) | |
• Modify the training procedure with regularization (overview: [277, 370]) | |
Due to its importance in practice, we will provide more detail on train-time | |
calibration methods. It has been shown for a broad class of loss functions | |
that risk minimization leads to Fisher consistent, Bayes optimal classifiers in | |
the asymptotic limit [25, 495]. These can be shown to decompose into a sum | |
of multiple metrics including both accuracy and calibration error [144, 177]. | |
However, there is no –finite data, nor asymptotic– guarantee that classifiers | |
trained with proper loss functions containing an explicit calibration term | |
will eventually be well-calibrated. In practice, being entangled with other | |
optimization terms often leads to sub-optimal calibration. For this reason, | |
recent studies [12, 230, 492] have derived trainable estimators of calibration | |
to have a better handle (γ > 0) on penalizing miscalibration, i.e., by jointly | |
optimizing risk (R(f ) = EX,Y [` (Y, f (X))]) and parameterized calibration error | |
(CE) as in Equation (2.16). | |
fˆ = arg min (R(f ) + γ CE(f )) | |
f ∈F | |
(2.16) | |
Many of these methods are implicitly or explicitly maximizing entropy of | |
predictions or entropy relative to another probability distribution, e.g., Entropy | |
Regularization [361], Label Smoothing (LS) [327], Focal Loss [324], Marginbased LS [277], next to more direct (differentiable), kernel-based calibration | |
error estimation [211, 230, 370, 492, 493, 526]. We had expected community | |
contribution on the DUDE competition (Chapter 5) to take advantage of this | |
wealth of calibration methods, yet the majority of submissions used uncalibrated | |
models with MSP, requiring more education on the importance of calibration | |
in practice. | |