Spaces:

jordyvl
/

ask_my_thesis

Paused

App Files Files Community

ask_my_thesis / assets /txts /pg_0053.txt

jordyvl

First commit

e0a78f5 8 months ago

raw

history blame

2.68 kB

	RELIABILITY AND ROBUSTNESS

	21

	of a foundation model for DU tasks (Chapters 4 to 6) or to contrast with 1-D
	CNNs in text classification (Chapter 3). Note that [265] share our concerns that
	NLP needs a new ‘playground’ with more realistic tasks and benchmarks, which
	extend beyond sentence-level contexts to more complex document-level tasks.
	Alternative sub-quadratic architectures have started addressing Transformer’s
	computational inefficiency on long sequences, e.g., Mamba [152] and Longnet
	[99]. Time will tell if these will be able to compete with the Transformer’s
	dominance in foundation models.

	2.2

	Reliability and Robustness

	Chapter 3 contains a lot of relevant content on the basic relation between
	uncertainty quantification, calibration, and distributional generalization or
	detection tasks. Here, we will focus on the more general concepts of reliability
	and robustness, and how they relate to concepts used throughout the rest of
	the thesis. Next, we discuss the need for confidence estimation and appropriate
	evaluation metrics, followed by short summaries of the main research trends in
	calibration and uncertainty quantification.
	Emerging guidance and regulations [2, 3, 475] place increasing importance on
	the reliability and robustness of ML systems, particularly once they are used
	in the public sphere or in safety-critical applications. In ML, reliability and
	robustness are often used interchangeably [78, 420, 455], yet they are distinct
	concepts, and it is important to understand the difference between them. This
	thesis uses the following definitions of reliability and robustness, adapted from
	systems engineering literature [395]:
	Definition 3 [Reliability]. Reliability is the ability of a system to consistently
	perform its intended function in a specific, known environment for a specific
	period of time, with a specific level of expected accuracy [395]. Closer to the ML
	context, this entails all evaluation under the i.i.d. assumption, allowing for some
	benign shifts of the distribution, including predictive performance evaluation
	with task-dependent metrics (accuracy, F1, perplexity, etc.), calibration, selective
	prediction, uncertainty estimation, etc.
	Reliability requires to clearly specify the role an ML component plays in a
	larger system, and to define the expected behavior of the system as a function
	of alignment with the training data distribution. This is particularly important
	in the context of black-box models, where the inner workings of the model are
	not transparent to the user. In this case, the user needs to be aware of the
	model’s limitations, e.g., model misspecification, lack of training data, and the