Workshop on Document Visual Question Answering

September 6, Lausanne, Switzerland | hybrid event

Visual Question Answering has become a key task in the “vision and language” field. As VQA datasets and tasks matured, it became clear that there were numerous questions of common interest which could not be answered unless the text that appears in the image could be read and understood in the context provided by the visual information. Scene Text VQA was introduced in 2019 to address this space.

Closer to classic document images, visual question answering was introduced in 2018 in the form of DVQA, focusing on understanding data visualisations through question answering. Understanding data visualisations such as barcharts, pie charts and plots requires understanding structure and style as well as interpreting textual and graphical elements.

The driving idea behind VQA approaches is that image analysis and information extraction is conditioned by the question. Document Analysis and Recognition research on the other hand tends to focus on generic bottom up information extraction tasks (character recognition, table extraction, word spotting), largely disconnected from the final purpose the extracted information is used for.

Driven by this observation, the task of DocVQA was born in 2020. By Document Visual Question Answering (DocVQA) we refer to a generic paradigm for purpose-driven document analysis and recognition, where natural language questions drive the information extraction and document understanding processes.

The DocVQA workshop aims to create some space in ICDAR to discuss the DocVQA paradigm and the results of the ICDAR 2021 long-term challenge on DocVQA. DocVQA 2021 comes after the successful organization of the Document Visual Question Answering (DocVQA) challenge as part of “Text and Documents in the Deep Learning Era” Workshop in CVPR 2020.

The DocVQA workshop will cover VQA approaches over document images, while it also aims to highlight the role of image-rendered textual information in other relevant VQA branches as well, such as Scene-Text VQA and data visualization question-answering (DVQA).


  • Invited speakers confirmed: Amanpreet Singh (Facebook), Dr. Yijuan Lu (Microsoft), and Dr Brian Price (Adobe)

  • The 2021 edition of the DocVQA challenge has come to an end. See results here. The winners will be presenting their solutions at the workshop.


DocVQA is a half day workshop scheduled for 06 September 2021 (Monday) in the afternoon from 14:00 - 17:30.

The workshop will take place in the context of the 16th Int. Conf. on Document Analysis and Recognition (ICDAR 2021). To register, please see details at the ICDAR web site.

ICDAR will take place as a hybrid conference.

How to join on-site: For those who attend ICDAR physically, you can join the workshop at "Virtual Room 2" / BC03 of the EPFL (Swiss Federal Institute of Technology). See details at the ICDAR's web.

How to join online: if you are a registered ICDAR participant, please access the ICDAR online portal at with your username and password. Look for the button "ACCESS STREAMING" next the workshop sessions, when the sessions go live.