Malayalam Character Image Database (Amrita_MalCharDb)
handwritten, Malayalam, Character Recognition
Before starting handwritten data collection, the Malayalam character classes are decided based on the unique orthographic structures in the Malayalam language script. 85 Malayalam character classes representing vowels, consonants, half-consonants, vowel modiﬁers, consonant modiﬁers and conjunct characters that are frequently used while writing is considered for database creation. For collecting character images, the writers are instructed to write the considered Malayalam character classes on pages ﬁve times using ballpoint pens by keeping attention on space between each written character. No restriction is kept on the type or quality of the paper and the ballpoint pen used for
The handwritten data collected from 77 (60 Female and 17 Male) native Malayalamwriters between 20 to 55 age groups and all the writers have minimum graduation as the educational qualiﬁcation. The learning and testing data are divided based on the writers rather than the collected images. Among 77 writers, the handwritten data collected from 59 persons considered for creating learning (training and validation) data while handwritten data from the remaining 18 persons considered for creating the testing data.
Fast global minimization algorithm for active contour models(ACM-FGM)employed for detecting the character objects in the collected document images. For converting the resultant image to a binary representation, Otsu’s global image threshold algorithm is used. Each image is converted to 32*32 dimension.
The dataset is submitted along with labels in CSV files. Each row represents the image pixel values with its class label. The first column represents the character class label, and the remaining columns (1024) represents the image data in vectorized form (32*32 image converted to 1024 vector).
|handwritten.zip||data||(3 MB)||31||Train, validation and test sets (3.61MB).|