RDCL2019 Competition Dataset (Recognition of Documents with Complex Layouts) (RDCL2019)

Dataset Information
Dataset URL
https://www.primaresearch.org/datasets/RDCL2019
Keywords
complex layout,magazines,OCR,segmentation
Description
For this competition, the evaluation set consisted of 85 images. These included ten new scans taken from IEEE Spectrum magazines and 75 images selected from the PRImA Layout Analysis dataset as a representative sample ensuring a balanced presence of different issues affecting layout analysis and OCR. Such issues include the presence of non-rectangular shaped regions, varying text column widths, varying font sizes, presence of separators and regions of “reverse video” text (light-coloured text on a dark background). The presence of running headers and captions of illustrations/photographs in addition to the main body of text, pose difficulties in the identification of the correct reading order of the page.