ask_my_thesis / assets /txts /pg_0053.txt
jordyvl's picture
First commit
e0a78f5
raw
history blame
2.68 kB
RELIABILITY AND ROBUSTNESS
21
of a foundation model for DU tasks (Chapters 4 to 6) or to contrast with 1-D
CNNs in text classification (Chapter 3). Note that [265] share our concerns that
NLP needs a new ‘playground’ with more realistic tasks and benchmarks, which
extend beyond sentence-level contexts to more complex document-level tasks.
Alternative sub-quadratic architectures have started addressing Transformer’s
computational inefficiency on long sequences, e.g., Mamba [152] and Longnet
[99]. Time will tell if these will be able to compete with the Transformer’s
dominance in foundation models.
2.2
Reliability and Robustness
Chapter 3 contains a lot of relevant content on the basic relation between
uncertainty quantification, calibration, and distributional generalization or
detection tasks. Here, we will focus on the more general concepts of reliability
and robustness, and how they relate to concepts used throughout the rest of
the thesis. Next, we discuss the need for confidence estimation and appropriate
evaluation metrics, followed by short summaries of the main research trends in
calibration and uncertainty quantification.
Emerging guidance and regulations [2, 3, 475] place increasing importance on
the reliability and robustness of ML systems, particularly once they are used
in the public sphere or in safety-critical applications. In ML, reliability and
robustness are often used interchangeably [78, 420, 455], yet they are distinct
concepts, and it is important to understand the difference between them. This
thesis uses the following definitions of reliability and robustness, adapted from
systems engineering literature [395]:
Definition 3 [Reliability]. Reliability is the ability of a system to consistently
perform its intended function in a specific, known environment for a specific
period of time, with a specific level of expected accuracy [395]. Closer to the ML
context, this entails all evaluation under the i.i.d. assumption, allowing for some
benign shifts of the distribution, including predictive performance evaluation
with task-dependent metrics (accuracy, F1, perplexity, etc.), calibration, selective
prediction, uncertainty estimation, etc.
Reliability requires to clearly specify the role an ML component plays in a
larger system, and to define the expected behavior of the system as a function
of alignment with the training data distribution. This is particularly important
in the context of black-box models, where the inner workings of the model are
not transparent to the user. In this case, the user needs to be aware of the
model’s limitations, e.g., model misspecification, lack of training data, and the