IIIT 5K-Word (IIIT 5K-Word)

2015-02-18 (v. 1)

Contact author

Anand Mishra

IIIT Hyderabad

anand.mishra@research.iiit.ac.in

+919160450921

+91406653 1413

You can cite this dataset as: Anand Mishra, IIIT 5K-Word (IIIT 5K-Word) ,1,ID:IIIT 5K-Word_1,URL:https://tc11.cvc.uab.es/datasets/IIIT 5K-Word_1

Dataset Information

Dataset URL

http://cvit.iiit.ac.in/projects/SceneTextUnderstanding/IIIT5K.html

Keywords

Scene text recognition, scene character recognition

Description

The IIIT 5K-Word dataset is harvested from Google image search. Query words like billboards, signboard, house numbers, house name plates, movie posters were used to collect images. The dataset contains 5000 cropped word images from Scene Texts and born-digital images. The dataset is divided into train and test parts. This dataset can be used for large lexicon cropped word recognition. We also provide a lexicon of more than 0.5 million dictionary words with this dataset.

Technical Details

The dataset contains:

(i) Cropped word images splitted into train and test sets

(ii) Ground truth word annotations, small and medium size lexicons

(iii) Large lexicon from

Jerod J Weinman, Erik Learned-Miller, Allen R Hanson, "Scene text recognition using similarity and a lexicon with sparse belief propagation", TPAMI, pp. 1733-1746, 2009

(iv) Character bounding box annotations

 

Comments
No comments on this dataset yet.
Valoration
In order to rate this dataset you need to be logged onLogin / Register