The main objective of this task is to detect the location of every text instance given an input image, which is similar to all the previous RRC scene text detection tasks. The input of this task is strictly constrained to image only, no other form of input is allowed to aid the model in the process of detecting the text instances.


IoU-based evaluation protocol is adopted for this task by following CTW1500 [1]. IoU is a threshold-based evaluation protocol, with 0.5 set as the default threshold. H-Mean under 0.5 will be used as ranking purpose. Meanwhile, in the case of multiple matches, we only consider the detection region with the highest IOU, the rest of the matches will be counted as False Positive. 

The expected detection result is the spatial location of every text instance at word-level for Latin scripts, and line-level for Chinese scripts, which should be arranged as (x0,y0,x1,y1, ...... xn,yn). There is no limitation on the length of the detection output.

