DocVQA dataset

DocVQA dataset (2020 Challenge task 1 dataset)

This dataset is the first dataset we introduced as part of the DocVQA project and consequently it is called the DocVQA dataset.


Similar to typical VQA task, task is to answer questions asked on a given document image. Similar to extractive QA framework popular in NLP, here the answer for the question is always a single span of text extracted from the given document image.

Images and Questions

There are 50 K questions and 12K Images in the dataset. Images are collected from UCSF Industry Documents Library. Questions and answers are manually annotated.


The dataset can be downloaded from the challenge page in RRC portal, Go to the "Download" tab in the challenge page and use the links under "Single Document Visual Question Answering"

Related publications

  • Minesh Mathew, Dimosthenis Karatzas and C.V. Jawahar - DocVQA: A Dataset for VQA on Document Images - WACV 2021 [ PDF ]

  • Minesh Mathew, Ruben Tito, Dimosthenis Karatzas, R. Manmatha and C.V. Jawahar - Document Visual Question Answering Challenge 2020 - DAS 2020 (Short) [ PDF ]