Spaces:

jordyvl
/

ask_my_thesis

Paused

App Files Files Community

ask_my_thesis / assets /txts /pg_0037.txt

jordyvl

First commit

e0a78f5 8 months ago

raw

history blame

2.77 kB

	RESEARCH CONTEXT

	5

	This thesis started almost concurrently with the rise of the global COVID19 pandemic, making it hard to foster collaborations in the early stages. At
	the start of the PhD, DU methodology was fairly established, with OCR and
	Transformer-based pipelines such as BERT [94] and LayoutLM [502], which
	is why we first prioritized the more fundamental challenge of decision-making
	under uncertainty (Part I); which was followed by a step back, closer to applied
	DU research (Part II).
	The research community’s understanding of ‘reliability’ has also evolved over
	time. When starting the work of Chapter 3, the notion of reliability was mostly
	associated with uncertainty quantification and calibration. However, calibration
	is not a panacea, and only fairly recently, Jaeger et al. [193] proposed a more
	general framework encapsulating reliability and robustness. They promote the
	more concrete and useful notion of failure prediction, which still involves
	confidence/uncertainty estimation yet with an explicit definition of the failure
	source which one wants to detect or guard against, e.g., in-domain test errors,
	changing input feature distributions, novel class shifts, etc. Since I share a
	similar view of the problem, I have focused following works on the more general
	notion of failure prediction, which is also more in line with the business context
	of IA.
	Whereas we originally intended to work on multi-task learning of DU subtasks,
	the rise of general-purpose LLMs offering a natural language interface to
	documents rather than discriminative modeling (e.g., ChatGPT [52, 344]),
	prompted us toward evaluating this promising technology in the context of
	DU. More importantly, we observed the lack of sufficiently complex datasets
	and benchmarks in DU that would allow us to tackle larger, more fundamental
	questions such as ’Do text-only LLMs suffice for most low-level DU subtasks?’
	(subsequently tackled in Chapter 5), which is why we shifted our focus to the
	more applied research questions of benchmarking and evaluation (Part II).
	Finally, the business context has also evolved over time. Originally, IDP was
	practiced by legacy OCR companies; specialized vendors, offering a range of
	solutions for specific document types (e.g., invoices, contracts, tax forms, etc.);
	or cloud service providers, offering IDP as part of a larger suite of services
	(e.g., AWS Textract, Azure Form Recognizer, etc.). However, the rise of both
	open-source LLM development and powerful, though closed-source models has
	lowered the barrier to entry for any new entrants or incumbents. This has led
	to a commoditization of IDP, with the quality of the LLMs and the ease of
	integration with existing business processes becoming key differentiators.