Dataset for the competition on Post-OCR Text Correction 2019 (Post-OCR 2019) (Post-OCR_2019)
Research Tasks
Detection of OCR errors
2019-10-20 (v. 1)
Description
Given the raw OCR-ed text, the participants are asked to provide the position and also the length of the suspected errors. The length information is non-trivial; although it is often recovered based on word boundaries, it could vary on some occasions (e.g. wrongly OCR-ed separators such as spaces, hyphens or line breaks).
Protocol
The detection task could is evaluated based on recall, precision and F-measure, as it is purely a matter of tokens being truly erroneous or not. The ranking will be made on the F-measure.
File | Type | Size | Downloads | Description |
---|---|---|---|---|
ICDAR2019_POCR_report_URL.pdf | article | (216 KB) | 3 | Final report of the competition |
Comments
No comments on this dataset yet.
Valoration

Add your comment
In order to comment on a dataset you need to be logged on