The large scene video text dataset for scene video text spotting (LSVTD)

Dataset Information
Keywords
scene video text detection, scene video text tracking, scene video text recognition
Description
You can get the dataset by 3 different ways:
- https://pan.baidu.com/s/10K2Hs8sT2vN5DSHd_F-fuQ (train and val, fetch code : t69a) https://pan.baidu.com/s/15hSGAFtDeas1vB8JgIWkJg (test, fetch code : rft3)
- https://drive.google.com/drive/folders/1AGwXQuoK_w95wlkZHlJmR4YKjFqaWyyn?usp=sharing
- https://sjtueducn-my.sharepoint.com/:f:/g/personal/dmlab_gds_sjtu_edu_cn/EhNnEM8HdjtGiINaeMiety4BlweKExoo3GslQ2BU9eYS_Q?e=uhJ4yd
The dataset contains 129 video clips (ranging from several seconds to over 1 minutes long) from 21 real-life scenes. It was extended on the basis of LSVTD dataset by addding 15 videos for 'harbor surveillance' scenario and 14 videos for 'train watch' scenario, for the purpose of solving video text spotting problem in industrial transportation applications.
dataset distribution
The dataset has 4 major characteristics:
1) large scale: it contains 129 video clips, larger than most existing scene video text datasets
2) diversified scenes: it covers 21 indoor and outdoor real-life scenarios, see the figure left
3) different capture devices: videos are collected with multiple different kinds of video cameras:
- mobile phone cameras in various indoor scenarios (e.g. bookstore and office building) and outdoor street views;
- HD cameras in traffic and harbor surveillance;
- And Car-DVR cameras in fast-moving outdoor scenarios (e.g. city road, highway)
4) multilingual instances: the dataset contains multiple languages which are divided into 2 major categories: alphanumeric and non-alphanumeric.
example