REID2019 Competition Dataset (Recognition of Early Indian Printed Documents) (REID2019)
For the most part, the scanned images contain single column lines of text, with a small amount containing illustrations as well as text. Some pages contain marginal data such as numbers, handwritten notes, and decorative frames. The evaluation set consisted of 56 images as a representative sample ensuring a balanced presence of different issues affecting layout analysis and OCR. Such issues include non-straight text lines, show-through or bleed-through, faded ink, decorations, the presence non-rectangular shaped regions, varying text column widths, varying font sizes, presence of separators and various aging- and scanning-related issues. In addition to the evaluation set, 25 representative images were selected as the example set that was provided to the authors with ground truth.
No comments on this dataset yet.