The large scene video text dataset for scene video text spotting (LSVTD)

2021-06-01 (v. 1)

Contact author

Baorui Zou

Hikvision Research Institute

(+86) 18826072052

You can cite this dataset as: Baorui Zou, The large scene video text dataset for scene video text spotting (LSVTD) ,1,ID:LSVTD_1,URL:

Dataset Information


scene video text detection, scene video text tracking, scene video text recognition


      You can get the dataset by 3 different ways: 

      The dataset contains 129 video clips (ranging from several seconds to over 1 minutes long) from 21 real-life scenes. It was extended on the basis of LSVTD dataset by addding 15 videos for 'harbor surveillance' scenario and 14 videos for 'train watch' scenario,  for the purpose of solving video text spotting problem in industrial transportation applications.

dataset distribution

The dataset has 4 major characteristics: 

1) large scale: it contains 129 video clips, larger than most existing scene video text datasets

2) diversified scenes: it covers 21 indoor and outdoor real-life scenarios, see the figure left

3) different capture devices: videos are collected with multiple different kinds of video cameras:

  • mobile phone cameras in various indoor scenarios (e.g. bookstore and office building) and outdoor street views;
  • HD cameras in traffic and harbor surveillance;
  • And Car-DVR cameras in fast-moving outdoor scenarios (e.g. city road, highway)

4) multilingual instances: the dataset contains multiple languages which are divided into 2 major categories: alphanumeric and non-alphanumeric.



No comments on this dataset yet.
In order to rate this dataset you need to be logged onLogin / Register