Spaces:
Paused
Paused
10 | |
INTRODUCTION | |
Chapter 4 reflects on the current state of DU research, and proposes guidelines to | |
foster document dataset construction efforts. It introduces two novel document | |
classification datasets, RVL-CDIP_MP and RVL-CDIP-N_MP, as extensions | |
of the RVL-CDIP dataset [165] with multipage documents. The datasets are | |
accompanied by a comprehensive experimental analysis, which shows promise | |
from advancing multipage document representations and inference. | |
Chapter 5 introduces the multi-faceted DUDE | |
benchmark for assessing | |
generic DU, that was also hosted as a competition to challenge the DU | |
community. It describes the complete methodology and design of the dataset, | |
targeting model innovations that can handle the complexity and variety of | |
real-world documents and subtasks, and generalize to any documents and any | |
questions. Next to a discussion of the competition results, it also presents | |
our own comprehensive benchmarking study of SOTA LLMs with varying the | |
context length and what modalities are represented. | |
Chapter 6 investigates how to efficiently obtain more semantic document layout | |
awareness. We explore what affects the teacher-student knowledge gap in | |
KD-based model compression methods, and design a downstream task setup | |
to evaluate the robustness of distilled DLA models on zero-shot layout-aware | |
DocVQA. | |
Finally, Chapter 7 concludes the thesis with a summary of the main contributions | |
(Section 7.1), and a discussion of future research directions. As a logical followup to Chapter 5, we propose in Section 7.2.2.1 how the DUDE dataset could | |
be extended to become the ‘ultimate’ DU benchmark. The thesis ends with a | |
hypothetical, informed design of how the research presented would form part of | |
an end-to-end, fully-fledged IA-DU solution (Section 7.2.2.2). | |