Handwritten Collection QA

BenthamQA and HW-SQuAD datasets

These datasets are releases as part of a new task we introduce where questions are asked on handwritten document collections. 

Task

Given a document collection and a natural language question, the task is to return a snippet of the document that answers the question being asked.

Images and Questions

HW-SQuAD is created from existing SQuAD dataset. We render passages  in the original datasets as document images and re use the questions. BenthamQA is a smaller dataset but containing real images from the Bentham handwritten manuscripts collection. 


Download

Bentham QA: Images Annotations

HW-SQuAD: Images Annotations


Related publication