ICDAR 2019 Historical Document Reading Challenge on Large Structured Chinese Family Records (ICDAR2019HDRC)
Dataset Information
Keywords
ICDAR2019HDRC, Historical Chinese documents, Document Image Analysis, Textline Recognition, Textline Detection, Text segmentation,
Description
The dataset consists of 1172 Chinese document images mainly written in Chinese traditional Han script. The document images have been taken from different books. The dataset is developed aiming historical documents to develop robust systems for historical document analysis. In this direction, there will be a competition named Historical Document Reading Challenge on Large Chinese Structured Family Records, in short ICDAR 2019 HDRC Chinese on this database. The objective behind this competition is to boost the research on historical document analysis. The focus of the competition is to recognize and analyze the layout, and finally detect and recognize the text lines and characters of the documents in this database.
Please follow the article
@inproceedings{simistira2019icdar2019hdrc,
archivePrefix = {arXiv},
arxivId = {1903.03341},
eprint = {1903.03341},
author = {Saini, Rajkumar and Dobson, Derek and Morrey, Jon and Liwicki, Marcus and Simistira Liwicki, Foteini},
title = {{ICDAR 2019 Historical Document Reading Challenge on Large Structured Chinese Family Records}},
booktitle={{to appear in 15th International Conference on Document Analysis and Recognition (ICDAR)}},
year = {2019},
month = {mar}, }
Technical Details
The dataset consists of (1) document images in JPG format, (2) XML ground truth files, and (3) PNG ground truth files.
The XML files contain the ground truth information for text line bounding box coordinates, text line recognition character string and writing direction, script type, and other information.
PNG files are the ground truth for text line detection and text segmentation.
The number of text lines, size of characters in images vary. There are graphical artifacts present in many document images.
File | Type | Size | Downloads | Description |
---|---|---|---|---|
ICDAR2019HDRCdataset.zip | data | (1146 MB) | 72 | Chinese Document Images |
ICDAR2019HDRC
- ICDAR 2019 Historical Document Reading Challenge on Large Structured Chinese Family Records v.1
- Ground Truth
- Research Tasks