REID2019 Competition Dataset (Recognition of Early Indian Printed Documents) (REID2019)

2019-05-31 (v. 1)

Contact author

Christian Clausner

University of Salford


You can cite this dataset as: Christian Clausner, REID2019 Competition Dataset (Recognition of Early Indian Printed Documents) (REID2019) ,1,ID:REID2019_1,URL:

Dataset Information

Dataset URL




For the most part, the scanned images contain single column lines of text, with a small amount containing illustrations as well as text. Some pages contain marginal data such as numbers, handwritten notes, and decorative frames. The evaluation set consisted of 56 images as a representative sample ensuring a balanced presence of different issues affecting layout analysis and OCR. Such issues include non-straight text lines, show-through or bleed-through, faded ink, decorations, the presence non-rectangular shaped regions, varying text column widths, varying font sizes, presence of separators and various aging- and scanning-related issues. In addition to the evaluation set, 25 representative images were selected as the example set that was provided to the authors with ground truth. 

No comments on this dataset yet.
In order to rate this dataset you need to be logged onLogin / Register