Spaces:

jordyvl
/

ask_my_thesis

Paused

App Files Files Community

ask_my_thesis / assets /txts /pg_0039.txt

jordyvl

First commit

e0a78f5 8 months ago

raw

history blame

2.27 kB

	PROBLEM STATEMENT AND QUESTIONS

	7

	RQ 2. In which settings are PUQ methods most useful, i.e., which failure sources
	/ distribution shifts are they most sensitive to?
	RQ 3. How can we obtain better PUQ estimates without overrelying on
	computationally prohibitive methods, e.g., Deep Ensemble [238]?
	RQ 4. How important are certain prior, neural architecture or hyperparameter
	influences on the quality of PUQ estimation?
	In a later chapter (Chapter 5), we introduce a complex benchmark for generic
	DU that additionally tests for robustness to domain, visual and layout shifts,
	and explores the novel problem of hallucination and control in natural language
	generation (NLG) with LLMs from the perspective of calibrated and selective
	DocVQA. The general task formulation involves a natural language question (on
	content, aspect, form, visual/layout), an input document, and a set of reference
	answers. The model is expected to provide a natural language answer, an answer
	confidence and a (binary) abstention decision. Evaluation is done in terms of
	answer correctness, calibration and selective prediction. On the one hand, one
	expects a model to lower confidence when unsure about the correctness of a
	predicted answer. On the other hand, one expects a model to abstain from
	answering and refrain from hallucinations on unanswerable questions (which
	had been explicitly added in the dataset).
	RQ 5. How severe is the problem of hallucination and control in LLMs when
	evaluated in a selective, free-form DocVQA task setting?

	1.2.2

	Realistic and Efficient Document Understanding

	The second part of the dissertation focuses on the more applied research questions
	of realistic and efficient DU. The overall objective is to make DU technology
	more generically applicable (Chapter 5), evaluation more in sync with real-world
	requirements (Chapters 4 and 5), and more efficient at modeling the multimodal
	and compositional nature of documents (Chapters 5 and 6).
	Due to the proximity to business applications and the risks of leaking personal
	information, DU research benchmarks have diverged substantially from the
	real-world distributions of document data. For instance, DU datasets are often
	limited to single-page document images, are from outdated sources (e.g., IIT-