September 6, Lausanne, Switzerland | hybrid event
Visual Question Answering has become a key task in the “vision and language” field. As VQA datasets and tasks matured, it became clear that there were numerous questions of common interest which could not be answered unless the text that appears in the image could be read and understood in the context provided by the visual information. Scene Text VQA was introduced in 2019 to address this space.
Closer to classic document images, visual question answering was introduced in 2018 in the form of DVQA, focusing on understanding data visualisations through question answering. Understanding data visualisations such as barcharts, pie charts and plots requires understanding structure and style as well as interpreting textual and graphical elements.
The driving idea behind VQA approaches is that image analysis and information extraction is conditioned by the question. Document Analysis and Recognition research on the other hand tends to focus on generic bottom up information extraction tasks (character recognition, table extraction, word spotting), largely disconnected from the final purpose the extracted information is used for.
Driven by this observation, the task of DocVQA was born in 2020. By Document Visual Question Answering (DocVQA) we refer to a generic paradigm for purpose-driven document analysis and recognition, where natural language questions drive the information extraction and document understanding processes.
The DocVQA workshop aims to create some space in ICDAR to discuss the DocVQA paradigm and the results of the ICDAR 2021 long-term challenge on DocVQA. DocVQA 2021 comes after the successful organization of the Document Visual Question Answering (DocVQA) challenge as part of “Text and Documents in the Deep Learning Era” Workshop in CVPR 2020.
The DocVQA workshop will cover VQA approaches over document images, while it also aims to highlight the role of image-rendered textual information in other relevant VQA branches as well, such as Scene-Text VQA and data visualization question-answering (DVQA).