ICFHR2018 Competition on Vietnamese Online Handwritten Text Recognition Database (HANDS-VNOnDB2018)

2018-02-21 (v. 1)

Contact author

NGUYEN TUAN HUNG

Tokyo University of Agriculture and Technology

ntuanhung@gmail.com

+81-423-88-7144

+81-423-88-7144

You can cite this dataset as: NGUYEN TUAN HUNG, ICFHR2018 Competition on Vietnamese Online Handwritten Text Recognition Database (HANDS-VNOnDB2018) ,1,ID:HANDS-VNOnDB2018_1,URL:http://tc11.cvc.uab.es/datasets/HANDS-VNOnDB2018_1

Dataset Information

Keywords

Vietnamese, Online Handwriting Database, ICFHR, recognition, competition

Description

HANDS-VNOnDB2018 (VNOnDB2018 in short [1]) is used for ICFHR2018 Competition on Vietnamese Online Handwritten Text Recognition using VNOnDB. It provides 1,146 Vietnamese paragraphs of handwritten text composed of 7,296 lines, more than 480,000 strokes and more that 380,000 characters written by 200 Vietnamese.

Writers were asked to write freely ground-truth text from a corpus of Vietnamese text. Our ground-truth text is derived from the VieTreeBank corpus which contains all of Vietnamese characters and some special symbols since it bases on Vietnamese newspapers. For collecting patterns, we used Fujistu PC Tablets (FMVT8170) with stylus pen at high sampling rate (120Hz). Each sequence contains multiple lines within various delayed strokes. VNOnDB2018 is available for research purpose only. For commercial purposes and sell products, please contact us via email.

 

Technical Details

The following is the structure of each InkML file in VNOnDB2018. There are two main sections: description section (including description, content_category, language, writer index, gender, age, ...) and trajectory data section (including multiple "traceGroup" elements). Each "traceGroup" element contains a groundtruth text in "Tg_Truth" tag and some strokes data in "trace" elements which are represented by x and y-coordinates of points.

<ink>
  <annotationXML>
    <Description>Cursive online handwriting</Description>
    <Content_Category>Text</Content_Category>
    <Language>Vietnamese</Language>
    <Writer_ID>id_xxxx</Writer_ID>
    <Gender>Male</Gender>
    <Age>22</Age>
    <Dominant_Hand>Left</Dominant_Hand>
    <Writing_Hand>Left</Writing_Hand>
    <Job>Student</Job>
    <Native_Language>Vietnamese</Native_Language>
    <Start_Time>2014-06-03T16:26:20</Start_Time>
    <DevName>FujitsuTabletPC</DevName>
    <SamplingRate>120</SamplingRate>
    <MaxNormalPressure>255</MaxNormalPressure>
    <Gt_File_Name>BCCTC</Gt_File_Name>
  </annotationXML>
  <traceGroup id="tg_0_0_0">
    <annotationXML>
      <Tg_Truth>Bản</Tg_Truth>
    </annotationXML>
    <trace id="tr_0_0"> x1 y1, x2 y2, x3 y3, ....</trace>
    <trace id="tr_0_1"> x11 y11, x12 y12, x13 y13, ....</trace>
  </traceGroup>
  ...
</ink>

 

FileTypeSizeDownloadsDescription
InkData_line.zipdata(239 MB)31This VNOnDB_Line compress file consists of online handwritten patterns with the ground truth at line level.
VNOnDB_ICFHR2018_dataSplit.zipdata(7 KB)38This VNOnDB2018 is the split for training, validation and testing sets used for ICFHR2018 Competition on Vietnamese Online Handwritten Text Recognition.
InkData_paragraph.zipdata(239 MB)20This VNOnDB_Paragraph compress file consists of online handwritten patterns with the ground truth at paragraph level.
InkData_word.zipdata(246 MB)30This VNOnDB_Word compress file consists of online handwritten patterns with the ground truth at word level.

References

[1] H. T. Nguyen, C. T. Nguyen, P. T. Bao, M. Nakagawaa A database of unconstrained Vietnamese online handwriting and recognition experiments by recurrent neural networks https://www.sciencedirect.com/science/article/pii/S0031320318300141

Comments

No comments on this dataset yet.

Add your comment

In order to comment on a dataset you need to be logged on
Register Now!

Valoration

In order to rate this dataset you need to be logged on
Register Now!