ask_my_thesis / assets /txts /pg_0038.txt
jordyvl's picture
First commit
e0a78f5
raw
history blame
2.3 kB
6
1.2
INTRODUCTION
Problem Statement and Questions
The general introduction sketches the context of the research, and motivates
the research questions. In this Section, I will formulate the problem statement
and research questions more formally and how they relate to the manuscript’s
contents.
1.2.1
Reliable and Robust Deep Learning
The dissertation opens with the more fundamental challenge of targeting
reliability and robustness in Deep Learning, which covers fairly abstract concepts
that have been used interchangeably and inconsistently in the literature. They
will be defined more extensively in Section 2.2, but for now, consider reliability
as the ability to avoid failure, robustness as the ability to resist failure, and
resilience as the ability to recover from failure [373, 438, 455]. In Chapter 3, we
focus on the more concrete objective of predictive uncertainty quantification
(PUQ), which shows promise for improving reliability and robustness in Deep
Learning (DL) [123, 140, 173, 455]. Concretely, PUQ methods are expected to
elucidate sources of uncertainty such as a model’s lack of in-domain knowledge
due to either training data scarcity or model misspecification, or its ability to
flag potentially noisy, shifted or unknown input data [136].
We observed that the majority of prior PUQ research focused on regression and
CV tasks, while the applicability of PUQ methods had not been thoroughly
explored in the context of NLP. As mentioned earlier, most DU pipelines (in
2020) were text-centric with a high dependency on the quality of OCR. Since
OCR is often considered a solved problem [262], we hypothesized that the main
source of error and uncertainty in DU would reside in the text representations
learned by deep neural networks (DNN)s. This is why we focused on the
more fundamental question of how well do PUQ methods scale in NLP? More
specifically, we restricted the scope to the prototypical, well-studied task of
text classification, for which we could leverage existing multi-domain datasets
varying in complexity, size and label space (multi-class vs. multi-label).
This leads to the following research questions:
RQ 1. When tested in realistic language data distributions on various text
classification tasks, how well do PUQ methods fare in NLP?