Dataset for the competition on Post-OCR Text Correction 2019 (Post-OCR 2019) (Post-OCR_2019)

Research Tasks

Detection of OCR errors

2019-10-20 (v. 1)

Contact author

Christophe Rigaud

L3i - University of La Rochelle

+33 5 46 45 82 62

+ 33 5 46 45 82 42


Given the raw OCR-ed text, the participants are asked to provide the position and also the length of the suspected errors. The length information is non-trivial; although it is often recovered based on word boundaries, it could vary on some occasions (e.g. wrongly OCR-ed separators such as spaces, hyphens or line breaks).


The detection task could is evaluated based on recall, precision and F-measure, as it is purely a matter of tokens being truly erroneous or not. The ranking will be made on the F-measure.

ICDAR2019_POCR_report_URL.pdfarticle(216 KB)3Final report of the competition


No comments on this dataset yet.

Add your comment

In order to comment on a dataset you need to be logged on
Register Now!


In order to rate this dataset you need to be logged on
Register Now!