Incidental Scene Occluded Text Dataset (ISTD-OC)

Research Tasks

Natural Scenes Text Recognition under Occlusion

2021-09-04 (v. 1)

Contact author

Aline Soares

University of Pernambuco


This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.


Text detection and recognition in natural scenes have become vital and active research topics in computer vision and document analysis. One of the primary reasons for this is the rapid developments in camera-based applications on portable devices such as smartphones and tablets, which have facilitated the acquisition and processing of large numbers of images with text every day. Due to the increase in high-performance and low-cost im-age capturing devices, natural scene text recognition (STR) applications rapidly expand and become more popular. Therefore, in the past few years, several techniques have been explored to solve scene text recognition. Although several remarkable breakthroughs have been made in the pipeline, text recognition in natural images is still challenging, caused by the significant variations of scene text in color, font, orientations, languages, spatial layout, uncontrollable background, camera resolution, partial occlusion, motion blurriness, among other problems.

Recently, with excellent performance on generic visual recognition, CNN has become one of the most attractive methods for scene text recognition, and many methods have been proposed, for instance,PSENet, EAST, PAN, and CRAFT. Such methods have obtained state-of-the-art results in terms of accuracy in benchmarks such as ICDAR 2015, MSRA-TD500, and COCO-Text. However, some of these existing methods failed in some complex cases, such as partial occlusion, arbitrarily shaped and curved texts, which are difficult to represent with a single rectangle. Partial occlusion appears as an open issue. It is a thick, contiguous, and spatially additive noise that partially hides one object from the other and represents a severe threat to a pattern recognition system’s performance, hindering the observation and reconstruction of information in a reliable manner.


Commonly models who evaluate the ICDAR dataset use the protocols for performing quantitative comparison among the text detection techniques. To quantify the chosen methods’ performance, we utilized their standard evaluation metrics: Precision (P) and Recall (R) metrics used in the information retrieval field.

When it comes to recognition, the most common evaluation metrics in STR systems are the word error rate (WER) and the character error rate (CER).


No comments on this dataset yet.

Add your comment

In order to comment on a dataset you need to be logged on
Register Now!


In order to rate this dataset you need to be logged on
Register Now!