RDCL2019 Competition Dataset (Recognition of Documents with Complex Layouts) (RDCL2019)

2019-05-31 (v. 1)

Contact author

Christian Clausner

University of Salford



You can cite this dataset as: Christian Clausner, RDCL2019 Competition Dataset (Recognition of Documents with Complex Layouts) (RDCL2019) ,1,ID:RDCL2019_1,URL:https://tc11.cvc.uab.es/datasets/RDCL2019_1

Dataset Information

Dataset URL



complex layout,magazines,OCR,segmentation


For this competition, the evaluation set consisted of 85 images. These included ten new scans taken from IEEE Spectrum magazines and 75 images selected from the PRImA Layout Analysis dataset as a representative sample ensuring a balanced presence of different issues affecting layout analysis and OCR. Such issues include the presence of non-rectangular shaped regions, varying text column widths, varying font sizes, presence of separators and regions of “reverse video” text (light-coloured text on a dark background). The presence of running headers and captions of illustrations/photographs in addition to the main body of text, pose difficulties in the identification of the correct reading order of the page.

No comments on this dataset yet.
In order to rate this dataset you need to be logged onLogin / Register