Handwritten Collection QA

BenthamQA and HW-SQuAD datasets

These datasets are releases as part of a new task we introduce where questions are asked on handwritten document collections.

Task

Given a document collection and a natural language question, the task is to return a snippet of the document that answers the question being asked.

Images and Questions

HW-SQuAD is created from existing SQuAD dataset. We render passages in the original datasets as document images and re use the questions. BenthamQA is a smaller dataset but containing real images from the Bentham handwritten manuscripts collection.


Download

Bentham QA: Images Annotations

HW-SQuAD: Images Annotations


Related publication

  • Minesh Mathew, Lluis Gomez, Dimosthenis Karatzas and CV Jawahar - Asking Questions on Handwritten Document Collections - ICDAR-IJDAR special issue 2021 - [PDF]