Spaces:
Paused
Paused
6 | |
1.2 | |
INTRODUCTION | |
Problem Statement and Questions | |
The general introduction sketches the context of the research, and motivates | |
the research questions. In this Section, I will formulate the problem statement | |
and research questions more formally and how they relate to the manuscript’s | |
contents. | |
1.2.1 | |
Reliable and Robust Deep Learning | |
The dissertation opens with the more fundamental challenge of targeting | |
reliability and robustness in Deep Learning, which covers fairly abstract concepts | |
that have been used interchangeably and inconsistently in the literature. They | |
will be defined more extensively in Section 2.2, but for now, consider reliability | |
as the ability to avoid failure, robustness as the ability to resist failure, and | |
resilience as the ability to recover from failure [373, 438, 455]. In Chapter 3, we | |
focus on the more concrete objective of predictive uncertainty quantification | |
(PUQ), which shows promise for improving reliability and robustness in Deep | |
Learning (DL) [123, 140, 173, 455]. Concretely, PUQ methods are expected to | |
elucidate sources of uncertainty such as a model’s lack of in-domain knowledge | |
due to either training data scarcity or model misspecification, or its ability to | |
flag potentially noisy, shifted or unknown input data [136]. | |
We observed that the majority of prior PUQ research focused on regression and | |
CV tasks, while the applicability of PUQ methods had not been thoroughly | |
explored in the context of NLP. As mentioned earlier, most DU pipelines (in | |
2020) were text-centric with a high dependency on the quality of OCR. Since | |
OCR is often considered a solved problem [262], we hypothesized that the main | |
source of error and uncertainty in DU would reside in the text representations | |
learned by deep neural networks (DNN)s. This is why we focused on the | |
more fundamental question of how well do PUQ methods scale in NLP? More | |
specifically, we restricted the scope to the prototypical, well-studied task of | |
text classification, for which we could leverage existing multi-domain datasets | |
varying in complexity, size and label space (multi-class vs. multi-label). | |
This leads to the following research questions: | |
RQ 1. When tested in realistic language data distributions on various text | |
classification tasks, how well do PUQ methods fare in NLP? | |