Multiply oriented and curved handwritten text line dataset (VML-MOC)

2019-11-25 (v. 1)

Contact author

Irina Rabaev

Sami Shamoon College of Engineering, Beer Sheva, Israel

irinar@ac.sce.ac.il

+972-8-6475620

You can cite this dataset as: Irina Rabaev, Multiply oriented and curved handwritten text line dataset (VML-MOC) ,1,ID:VML-MOC_1,URL:https://tc11.cvc.uab.es/datasets/VML-MOC_1

Dataset Information

Dataset URL

https://www.cs.bgu.ac.il/~berat/data/moc_dataset.zip

Keywords

curved and skewed text lines, Arabic historical documents, historical documents

Description

VML-MOC (Visual Media Lab - Multiply Oriented and Curved) [1] is a natural handwritten benchmark dataset for heavily skewed and curved text lines. These text lines were written as remarks on the page margins by different writers  over the years. They appear at different locations within the orientations that range between 0o and 180o  or as curvilinear forms.

VML-MOC dataset document images purely contain binarized side notes. Hence, the researchers can focus only on text line extraction of multiply oriented and curved text lines, devoid of dealing with the challenges of page segmentation, heterogeneity of side text and main text areas and binarization defects.

The dataset consists of 30 document images devided into train (20 pages) and test (10 pages) sets.

The ground truth is provided in three forms: raw pixel labeling, DIVA pixel labeling and PAGE xml file.

 

 

References

[1] B.Kurar, Rafi Cohen, I. Rabaev, and J. El-Sana VML-MOC: Segmenting a multiply oriented and curved handwritten text lines dataset. In the 3rd International workshop on Arabic and derived Script Analysis and Recognition (ASAR), pp. 13 - 18, 2019. (PDF)

Comments
No comments on this dataset yet.
Valoration
In order to rate this dataset you need to be logged onLogin / Register