Dataset for the competition on Post-OCR Text Correction 2019 (Post-OCR 2019) (Post-OCR_2019)
Detection of OCR errors
Given the raw OCR-ed text, the participants are asked to provide the position and also the length of the suspected errors. The length information is non-trivial; although it is often recovered based on word boundaries, it could vary on some occasions (e.g. wrongly OCR-ed separators such as spaces, hyphens or line breaks).
The detection task could is evaluated based on recall, precision and F-measure, as it is purely a matter of tokens being truly erroneous or not. The ranking will be made on the F-measure.
|ICDAR2019_POCR_report_URL.pdf||article||(216 KB)||2||Final report of the competition|
No comments on this dataset yet.