The Street View Text Dataset (SVT)

Ground Truth

Image annotation

2014-01-13 (v. 1)

Contact author

Kai Wang

Department of Comp. Sci. and Engr. University of California, San Diego



Workers are presented with an image and a list of candidate words to label with bounding boxes. This contrasts with the ICDAR Robust Reading data set in that we only label words associated with businesses. We used Alex Sorokin's Annotation Toolkit to support bounding box image annotation. For each image, we obtained a list of local business names using the Search Nearby:* in Google Maps at the image's address. We stored the top 20 business results for each image, typically resulting in 50 unique words. To summarize, the SVT data set consists of images collected from Google Street View, where each image is annotated with bounding boxes around words from businesses around where the image was taken.

The annotations are in XML using tags similar to those from the ICDAR 2003 Robust Reading Competition


JsonDiv 05-16-2017 03:55
Shancheng 08-17-2017 04:50
hengliyaya 07-29-2019 13:38

Add your comment

In order to comment on a dataset you need to be logged on
Register Now!


In order to rate this dataset you need to be logged on
Register Now!