REID2019 Competition Dataset (Recognition of Early Indian Printed Documents) (REID2019)
For the most part, the scanned images contain single column lines of text, with a small amount containing illustrations as well as text. Some pages contain marginal data such as numbers, handwritten notes, and decorative frames. The evaluation set consisted of 56 images as a representative sample ensuring a balanced presence of different issues affecting layout analysis and OCR. Such issues include non-straight text lines, show-through or bleed-through, faded ink, decorations, the presence non-rectangular shaped regions, varying text column widths, varying font sizes, presence of separators and various aging- and scanning-related issues. In addition to the evaluation set, 25 representative images were selected as the example set that was provided to the authors with ground truth.