Incidental Scene Occluded Text Dataset (ISTD-OC)
occlusion, natural scenes, text recognition
If you use this dataset, please consider citing the following paper:
SOARES, A. G. ; BEZERRA, BYRON L. D. ; LIMA, E. B. . How Far Deep Learning Systems for Text Detection and Recognition in Natural Scenes are Affected by Occlusion?. In: ICDAR 2021 WORKSHOP ON CAMERA-BASED DOCUMENT ANALYSIS AND RECOGNITION, 2021, Lausanne. Proceedings of the 16th International Conference on Document Analysis and Recognition, 2021.
The ISTD-OC dataset is one of our contributions on "How Far Deep Learning Systems for TextDetection and Recognition in Natural Scenes are Affected by Occlusion?" published at CBDAR 2021. The dataset is derivated from a systematic methodology in which, for each image, word instances were occluded differently. The work is only a small step towards recognizing occluded texts in natural scenes robustly. Still, the proposal is sufficient for the initial analysis of current models' generalization ability in the face of occlusion.
So, to provide occluded images through the ISTD-OC dataset, we've collected 1500 images for detection and 4468 images for recognition from the ICDAR 2015 dataset. The core idea was to deliver pictures with a range of partial occlusion in a certain degree of randomness. The prediction was made considering the proportion of the text size to the original image. Therefore, occlusion corresponds to a random part of the original image overlaid on the text, and its generated by a Gaussian distribution. In this way, minimal readers are prevented from receiving very large occlusions and vice versa. The occlusion level fluctuates between 0 and 100%, intending to analyze the techniques graphically and at what occlusion performance deteriorates.
Moreover, to identify occluded texts a little more complex, we added noise to the RGB channels of the occlusion, so it is possible to prevent models from learning patterns based on equal regions to locate occlusions. Even using the ICDAR 2015 dataset as a reference to assess the occlusion problem, once the text region is situated through the coordinates, the occlusion generation methodology allows different levels of obstruction to be applied to the text of any other desired dataset. The choice of this specific benchmark as a source is due only to its recurrent application in the analysis of baseline algorithms, avoiding possible inconsistencies during the evaluation process.
Occlusion generation process.
If you feel interested in the implementation code access: https://github.com/alinesoares1/ISTD-OC-Dataset.
The source file size is smaller than 1 GB and contains 25770 images divided into detection and recognition tasks. The images are in ".png" format, and we also provide the ground truth archive.