Scene text recognition has gained huge interest in recent years. The problem of scene text recognition has been looked into two different setting: (i) closed vocabulary, where a lexicon (which contain ground truth) is provided with every word image. (ii) open vocabulary, where the ground truth word may or may not belong to the English dictionary (e.g., house numbers, proper nouns). This dataset can be used for recognition in these two settings, and is not only significantly bigger in size but also more challenging than other popular datasets like ICDAR and SVT. Moreover, since this dataset also comes with character bounding box annotations, hence it can also be used for reporting scene character recognition performance.


Word recognition: The IIIT-5K word dataset is divided into train and test sets. We also provide small, medium and large lexicons along with the dataset. These lexicons should be used to report case insensitive word recognition accuracies in different settings (open/closed vocabulary).

Scene character recognition: The IIIT 5K-word dataset can also be used for reporting case sensitive character recognition accuracy on 62 classes


