ICDAR2026 Competition on Multimodal Reasoning over Documents in Multiple Domains

The validation data is out and ready to use!

The ICDAR2026 Competition on Multimodal Reasoning over Documents in Multiple Domains (DocVQA2026) builds on the successful DocVQA series of competitions, and aims to evaluate models on documents from 8 different domains, testing their multimodal reasoning abilities by introducing richer question types. DocVQA2026 questions go beyond simple extraction, requiring reasoning across multiple sources of evidence along the document. The multimodal reasoning abilities that we expect evaluated methods to showcase include: spatial understanding (e.g. tested on maps, engineering drawings, and general layout tasks), temporal understanding (principally tested on comic stories), and multi-hop answers that require obtaining multiple evidences throughout the document (e.g. combining information from the text, tables, and figures).