ICDAR 2019 Historical Document Reading Challenge on Large Structured Chinese Family Records (ICDAR2019HDRC)
Complete, integrated textline detection and recognition on a large dataset
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.
The scope of this competition is to detect and recognize (OCR) a given document image. The training data will be available also in PAGE-XML format. The PAGE-XML file will contain the information of the text lines’ location and their corresponding text. Thus, the similar PAGE-XML file is expected as the output, given a document image as input.
The extended version v2 of the dataset for the task is available at
and the password is (without quotes)
"I hereby accept the Terms and Conditions of the ICDAR 2019 HDRC-Chinese Competition"
The evaluation is based on graph-based string EDIT distance averaged over all documents. However, researchers are free to use other methods as well to evaluate the performance.
The tool is found here.
No comments on this dataset yet.