Spaces:

jordyvl
/

ask_my_thesis

Paused

App Files Files Community

ask_my_thesis / assets /txts /pg_0038.txt

jordyvl

First commit

e0a78f5 10 months ago

raw

history blame

2.3 kB

	6

	1.2

	INTRODUCTION

	Problem Statement and Questions

	The general introduction sketches the context of the research, and motivates
	the research questions. In this Section, I will formulate the problem statement
	and research questions more formally and how they relate to the manuscript’s
	contents.

	1.2.1

	Reliable and Robust Deep Learning

	The dissertation opens with the more fundamental challenge of targeting
	reliability and robustness in Deep Learning, which covers fairly abstract concepts
	that have been used interchangeably and inconsistently in the literature. They
	will be defined more extensively in Section 2.2, but for now, consider reliability
	as the ability to avoid failure, robustness as the ability to resist failure, and
	resilience as the ability to recover from failure [373, 438, 455]. In Chapter 3, we
	focus on the more concrete objective of predictive uncertainty quantification
	(PUQ), which shows promise for improving reliability and robustness in Deep
	Learning (DL) [123, 140, 173, 455]. Concretely, PUQ methods are expected to
	elucidate sources of uncertainty such as a model’s lack of in-domain knowledge
	due to either training data scarcity or model misspecification, or its ability to
	flag potentially noisy, shifted or unknown input data [136].
	We observed that the majority of prior PUQ research focused on regression and
	CV tasks, while the applicability of PUQ methods had not been thoroughly
	explored in the context of NLP. As mentioned earlier, most DU pipelines (in
	2020) were text-centric with a high dependency on the quality of OCR. Since
	OCR is often considered a solved problem [262], we hypothesized that the main
	source of error and uncertainty in DU would reside in the text representations
	learned by deep neural networks (DNN)s. This is why we focused on the
	more fundamental question of how well do PUQ methods scale in NLP? More
	specifically, we restricted the scope to the prototypical, well-studied task of
	text classification, for which we could leverage existing multi-domain datasets
	varying in complexity, size and label space (multi-class vs. multi-label).
	This leads to the following research questions:
	RQ 1. When tested in realistic language data distributions on various text
	classification tasks, how well do PUQ methods fare in NLP?