ICFHR 2014 CROHME: Fourth International Competition on Recognition of Online Handwritten Mathematical Expressions (CROHME-2014)

Research Tasks

Isolated Mathematical Symbol Recognition

2015-02-16 (v. 1)

Contact author

Harold Mouchère

University of Nantes / IRCCyN

harold.mouchere@univ-nantes.fr

+33 2-40-68-30-82

-


This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.

Description

There are a large number of mathematical symbols, redering mathematical symbol recognition a challenging task. In the last CROHME competition, there were 101 mathematical symbol classes. In a handwritten expression, symbols need to be both segmented and classified, forcing classifiers to consider mis-segmented (i.e. invalid) symbols. In this competition, we will compare classification systems two ways: 1) regading valid symbols, and 2) both valid symbols and mis-segmented (invalid) symbols. Mis-segmented symbols should be assigned to a reject class.

Recognizing mathematical expressions without an accurate symbol classifier is very difficult. Processes for locating symbols (i.e. segmenting) and identifying relationships between symbols (i.e. parsing) need to consider symbol identities. While classification is also dependent upon segmentation and parsing decisions, isolated symbol classification is the logical first step in constructing a handwritten math recognizer. This is because the classifier output space is fixed, while segmentation and parsing involve search. A classifier can be incorporated into objective functions for search algorithms used in segmentation and parsing, so that classifier outputs can help drive the search towards interpretations that include valid symbol hypotheses.

The train and test isoalated symbol datasets are extracted from the train and test full expression datasets from the main CROHME task. The script is available in the tool kit.

Protocol

The systems are tested under two conditions: 1) only valid symbols, and 2) valid symbols and mis-segmented symbols ("junk"). In both cases, the Top-N recognition rate from Top-1 through Top-10 are computed. In the condition with junk samples, the False Positive and False Rejection Rates are also computed.

Comments

No comments on this dataset yet.

Add your comment

In order to comment on a dataset you need to be logged on
Register Now!

Valoration

In order to rate this dataset you need to be logged on
Register Now!