This task is to localize text from street view images at the level of text lines in bounding boxes or polygons.


The text detection task of LSVT is evaluated in terms of Precision, Recall and F-score with the IoU threshold of 0.5 and 0.7, and only the F-score under 0.5 will be used as the primary metric for the final ranking. A detected text line is considered as true positive if the detected region has more than 0.5 IOU with the ground truth box. Meanwhile, in the case of multiple matches, we only consider the detection region with the highest IOU, and the rest of the matches will be counted as False Positive. All detected or missed "Do not care" ground truths will not contribute to the evaluation result.

The expected detection result is the locations of text lines in quadrangles or polygons for all the text instances. There is no limitation on the length of the detection output.


