Spaces:

jordyvl
/

ask_my_thesis

Paused

App Files Files Community

ask_my_thesis / assets /txts /pg_0035.txt

jordyvl

First commit

e0a78f5 8 months ago

raw

history blame

2.91 kB

	INTRODUCTION

	3

	can change depending on the context in which it is used. As an artifact of the
	communication channel, not all documents are born digitally, and the quality
	of the document can vary greatly, with some documents being handwritten,
	scanned with low resolution, or even a picture of a document. Furthermore,
	documents are often not standardized templates and can be highly variable in
	terms of layout, structure, and content. Finally, the longer the document, the
	more computationally demanding it becomes to process, and the more likely it
	is to induce errors, which can be harder to detect.
	Addressing the inherent challenges of document processing, and achieving high
	levels of accuracy, processing speed, reliability, robustness, and scalability in
	DU forms the applied scope of this thesis.
	(II) Consider the example given of the birth certificate. While I might not
	appreciate as much the manual handling of this document, if they had registered
	my baby girl’s name (Feliz, Spanish writing without an accent on the ‘e’)
	incorrectly, I would be pretty upset as this could have further repercussions.
	Whereas this error might be easily rectified, it is not so easy to do so in the
	case of a mortgage application, where the wrong information could lead to a
	rejection of the application, or even worse, a loan agreement with the wrong
	terms and conditions. This demonstrates that, even when full automation of
	document processing is in high demand, it is not always desirable if the risk of
	failure might be too large.
	Nevertheless, a lot of the potential for automation remains untapped, and
	organizations are increasingly looking for solutions to fully automate their
	document processing workflows. However, full automation, implying perfect
	recognition of document categories and impeccable information extraction is an
	unattainable goal with the current state of technology [79].
	The more realistic objective set is Intelligent Automation (IA) (elaborated
	on in Section 2.4), where the goal is to have the machine estimate confidence
	in its predictions, deriving business value with as high as possible volumes of
	perfect predictions (Straight-Through-Processing, STP) without incurring extra
	costs (False Positives, FP).
	The leitmotif of this thesis will be the fundamental enablers of IA: confidence
	estimation and failure prediction.
	Calibrated uncertainty estimation with efficient and effective DU technology
	will allow organizations to confidently automate their document processing
	workflow, while keeping a human in the loop only for predictions with a higher
	likelihood of being wrong. To date, however, little research has addressed the
	question of how to make DU technology more reliable, as is illustrated in a toy
	analysis (Table 1.1) reporting the absence of many IA-related keywords in the
	Proceedings of the 2021 International Conference on Document Analysis and