Dataset for the competition on Post-OCR Text Correction 2019 (Post-OCR 2019) (Post-OCR_2019)

Research Tasks

Detection of OCR errors

2019-10-20 (v. 1)

Contact author

Christophe Rigaud

L3i - University of La Rochelle

christophe.rigaud@univ-lr.fr

+33 5 46 45 82 62

+ 33 5 46 45 82 42

Description

Given the raw OCR-ed text, the participants are asked to provide the position and also the length of the suspected errors. The length information is non-trivial; although it is often recovered based on word boundaries, it could vary on some occasions (e.g. wrongly OCR-ed separators such as spaces, hyphens or line breaks).

Protocol

The detection task could is evaluated based on recall, precision and F-measure, as it is purely a matter of tokens being truly erroneous or not. The ranking will be made on the F-measure.

FileTypeSizeDownloadsDescription
ICDAR2019_POCR_report_URL.pdfarticle(216 KB)1Final report of the competition

Comments

No comments on this dataset yet.

Add your comment

In order to comment on a dataset you need to be logged on
Register Now!

Valoration

In order to rate this dataset you need to be logged on
Register Now!