Handwritten Collection QA
BenthamQA and HW-SQuAD datasets
These datasets are releases as part of a new task we introduce where questions are asked on handwritten document collections.
Given a document collection and a natural language question, the task is to return a snippet of the document that answers the question being asked.
Images and Questions
HW-SQuAD is created from existing SQuAD dataset. We render passages in the original datasets as document images and re use the questions. BenthamQA is a smaller dataset but containing real images from the Bentham handwritten manuscripts collection.
Minesh Mathew, Lluis Gomez, Dimosthenis Karatzas and CV Jawahar - Asking Questions on Handwritten Document Collections - ICDAR-IJDAR special issue 2021 - [PDF]