ICDAR 2019 Historical Document Reading Challenge on Large Structured Chinese Family Records (ICDAR2019HDRC)

Research Tasks

Complete, integrated textline detection and recognition on a large dataset

2019-08-29 (v. 1)

Contact author

Rajkumar Saini, Derek Dobson, Jon Morrey, Marcus Liwicki, Foteini Simistira Liwicki

LTU, Sweden

rajkumar.saini@ltu.se, marcus.liwicki@ltu.se

+46 (0)920 491006


This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.

Description

The scope of this competition is to detect and recognize (OCR) a given document image. The training data will be available also in PAGE-XML format. The PAGE-XML file will contain the information of the text lines’ location and their corresponding text. Thus, the similar PAGE-XML file is expected as the output, given a document image as input. 

Protocol

The evaluation is based on graph-based string EDIT distance averaged over all documents. However, researchers are free to use other methods as well to evaluate the performance.

The tool is found here.

https://ltu.box.com/s/qi2s9c3cftp8ey0wz6ehwvykejox4gay

Comments

No comments on this dataset yet.

Add your comment

In order to comment on a dataset you need to be logged on
Register Now!

Valoration

In order to rate this dataset you need to be logged on
Register Now!