The large scene video text dataset for scene video text spotting (LSVTD)

Research Tasks

Video Text Tracking

2021-06-01 (v. 1)

Contact author

Baorui Zou

Hikvision Research Institute

(+86) 18826072052

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License.


TASK 2 - Video Text Tracking

This task intends to track all the text streams in videos. 


We evaluate results based on an adaptation of the CLEAR-MOT [1] and VACE [2] evaluation framework. We here adapt these metrics to the specificities of text tracking by following the protocals in [3], i.e., MOTA, MOTP, ATA will be used as the evaluation metrics. 




[1]. K. Bernardin and R. Stiefelhagen, “Evaluating multiple object tracking performance: The CLEAR MOT metrics,” EURASIP Journal on Image and Video Processing, vol. 2008, May 2008.

[2]. R. Kasturi, D. Goldgof, P. Soundararajan, V. Manohar, J. Garofolo, R. Bowers, M. Boonstra, V. Korzhova, and J. Zhang, “Framework for performance evaluation of face, text, and vehicle detection and tracking in video: Data, metrics, and protocol,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 2, pp. 319–336, 2009.

[3]. Dimosthenis Karatzas, Faisal Shafait, Seiichi Uchida, Masakazu Iwamura, Lluis Gomez i Bigorda, Sergi Robles Mestre, Joan Mas, David Fernandez Mota, Jon Almazan Almazan, and Lluis Pere De Las Heras. 2013. ICDAR 2013 robust reading competition. In ICDAR. 1484–1493.


No comments on this dataset yet.

Add your comment

In order to comment on a dataset you need to be logged on
Register Now!


In order to rate this dataset you need to be logged on
Register Now!