Spaces:
Paused
Paused
RESEARCH CONTEXT | |
5 | |
This thesis started almost concurrently with the rise of the global COVID19 pandemic, making it hard to foster collaborations in the early stages. At | |
the start of the PhD, DU methodology was fairly established, with OCR and | |
Transformer-based pipelines such as BERT [94] and LayoutLM [502], which | |
is why we first prioritized the more fundamental challenge of decision-making | |
under uncertainty (Part I); which was followed by a step back, closer to applied | |
DU research (Part II). | |
The research community’s understanding of ‘reliability’ has also evolved over | |
time. When starting the work of Chapter 3, the notion of reliability was mostly | |
associated with uncertainty quantification and calibration. However, calibration | |
is not a panacea, and only fairly recently, Jaeger et al. [193] proposed a more | |
general framework encapsulating reliability and robustness. They promote the | |
more concrete and useful notion of failure prediction, which still involves | |
confidence/uncertainty estimation yet with an explicit definition of the failure | |
source which one wants to detect or guard against, e.g., in-domain test errors, | |
changing input feature distributions, novel class shifts, etc. Since I share a | |
similar view of the problem, I have focused following works on the more general | |
notion of failure prediction, which is also more in line with the business context | |
of IA. | |
Whereas we originally intended to work on multi-task learning of DU subtasks, | |
the rise of general-purpose LLMs offering a natural language interface to | |
documents rather than discriminative modeling (e.g., ChatGPT [52, 344]), | |
prompted us toward evaluating this promising technology in the context of | |
DU. More importantly, we observed the lack of sufficiently complex datasets | |
and benchmarks in DU that would allow us to tackle larger, more fundamental | |
questions such as ’Do text-only LLMs suffice for most low-level DU subtasks?’ | |
(subsequently tackled in Chapter 5), which is why we shifted our focus to the | |
more applied research questions of benchmarking and evaluation (Part II). | |
Finally, the business context has also evolved over time. Originally, IDP was | |
practiced by legacy OCR companies; specialized vendors, offering a range of | |
solutions for specific document types (e.g., invoices, contracts, tax forms, etc.); | |
or cloud service providers, offering IDP as part of a larger suite of services | |
(e.g., AWS Textract, Azure Form Recognizer, etc.). However, the rise of both | |
open-source LLM development and powerful, though closed-source models has | |
lowered the barrier to entry for any new entrants or incumbents. This has led | |
to a commoditization of IDP, with the quality of the LLMs and the ease of | |
integration with existing business processes becoming key differentiators. | |