ask_my_thesis / assets /txts /pg_0042.txt
jordyvl's picture
First commit
e0a78f5
raw
history blame
1.79 kB
10
INTRODUCTION
Chapter 4 reflects on the current state of DU research, and proposes guidelines to
foster document dataset construction efforts. It introduces two novel document
classification datasets, RVL-CDIP_MP and RVL-CDIP-N_MP, as extensions
of the RVL-CDIP dataset [165] with multipage documents. The datasets are
accompanied by a comprehensive experimental analysis, which shows promise
from advancing multipage document representations and inference.
Chapter 5 introduces the multi-faceted DUDE
benchmark for assessing
generic DU, that was also hosted as a competition to challenge the DU
community. It describes the complete methodology and design of the dataset,
targeting model innovations that can handle the complexity and variety of
real-world documents and subtasks, and generalize to any documents and any
questions. Next to a discussion of the competition results, it also presents
our own comprehensive benchmarking study of SOTA LLMs with varying the
context length and what modalities are represented.
Chapter 6 investigates how to efficiently obtain more semantic document layout
awareness. We explore what affects the teacher-student knowledge gap in
KD-based model compression methods, and design a downstream task setup
to evaluate the robustness of distilled DLA models on zero-shot layout-aware
DocVQA.
Finally, Chapter 7 concludes the thesis with a summary of the main contributions
(Section 7.1), and a discussion of future research directions. As a logical followup to Chapter 5, we propose in Section 7.2.2.1 how the DUDE dataset could
be extended to become the ‘ultimate’ DU benchmark. The thesis ends with a
hypothetical, informed design of how the research presented would form part of
an end-to-end, fully-fledged IA-DU solution (Section 7.2.2.2).