RDCL2019 Competition Dataset (Recognition of Documents with Complex Layouts) (RDCL2019)

2019-05-31 (v. 1)

Contact author

Christian Clausner

University of Salford

c.clausner@salford.ac.uk

01612956749

You can cite this dataset as: Christian Clausner, RDCL2019 Competition Dataset (Recognition of Documents with Complex Layouts) (RDCL2019) ,1,ID:RDCL2019_1,URL:https://tc11.cvc.uab.es/datasets/RDCL2019_1

Dataset Information

Dataset URL

https://www.primaresearch.org/datasets/RDCL2019

Keywords

complex layout,magazines,OCR,segmentation

Description

For this competition, the evaluation set consisted of 85 images. These included ten new scans taken from IEEE Spectrum magazines and 75 images selected from the PRImA Layout Analysis dataset as a representative sample ensuring a balanced presence of different issues affecting layout analysis and OCR. Such issues include the presence of non-rectangular shaped regions, varying text column widths, varying font sizes, presence of separators and regions of “reverse video” text (light-coloured text on a dark background). The presence of running headers and captions of illustrations/photographs in addition to the main body of text, pose difficulties in the identification of the correct reading order of the page.

Comments
No comments on this dataset yet.
Valoration
In order to rate this dataset you need to be logged onLogin / Register