ICDAR 2019 Historical Document Reading Challenge on Large Structured Chinese Family Records (ICDAR2019HDRC)

Research Tasks

Complete, integrated textline detection and recognition on a large dataset

2019-12-30 (v. 2)

Contact author

Rajkumar Saini, Derek Dobson, Jon Morrey, Marcus Liwicki, Foteini Simistira Liwicki

LTU, Sweden

rajkumar.saini@ltu.se, marcus.liwicki@ltu.se

+46 (0)920 491006

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.


The scope of this competition is to detect and recognize (OCR) a given document image. The training data will be available also in PAGE-XML format. The PAGE-XML file will contain the information of the text lines’ location and their corresponding text. Thus, the similar PAGE-XML file is expected as the output, given a document image as input. 


The extended version v2 of the dataset for the task is available at 


and the password is (without quotes)

"I hereby accept the Terms and Conditions of the ICDAR 2019 HDRC-Chinese Competition"


The evaluation is based on graph-based string EDIT distance averaged over all documents. However, researchers are free to use other methods as well to evaluate the performance.

The tool is found here.



No comments on this dataset yet.

Add your comment

In order to comment on a dataset you need to be logged on
Register Now!


In order to rate this dataset you need to be logged on
Register Now!