Malayalam Character Image Database (Amrita_MalCharDb)

2019-07-21 (v. 1)

Contact author

Manjusha K

Research Scholar, Amrita University



You can cite this dataset as: Manjusha K, Malayalam Character Image Database (Amrita_MalCharDb) ,1,ID:Amrita_MalCharDb_1,URL:

Dataset Information


handwritten, Malayalam, Character Recognition


Before starting handwritten data collection, the Malayalam character classes are decided based on the unique orthographic structures in the Malayalam language script. 85 Malayalam character classes representing vowels, consonants, half-consonants, vowel modifiers, consonant modifiers and conjunct characters that are frequently used while writing is considered for database creation. For collecting character images, the writers are instructed to write the considered Malayalam character classes on pages five times using ballpoint pens by keeping attention on space between each written character. No restriction is kept on the type or quality of the paper and the ballpoint pen used for

The handwritten data collected from 77 (60 Female and 17 Male) native Malayalamwriters between 20 to 55 age groups and all the writers have minimum graduation as the educational qualification. The learning and testing data are divided based on the writers rather than the collected images. Among 77 writers, the handwritten data collected from 59 persons considered for creating learning (training and validation) data while handwritten data from the remaining 18 persons considered for creating the testing data.  

Fast global minimization algorithm for active contour models(ACM-FGM)employed for detecting the character objects in the collected document images.  For converting the resultant image to a binary representation, Otsu’s global image threshold algorithm is used. Each image is converted to 32*32 dimension.


Technical Details

The dataset is submitted along with labels in CSV files. Each row represents the image pixel values with its class label. The first column represents the character class label, and the remaining columns (1024) represents the image data in vectorized form (32*32 image converted to 1024 vector).

handwritten.zipdata(3 MB)140Train, validation and test sets (3.61MB).
Akshay Vijayan 02-09-2020 06:10
Can you please provide image files of the dataset?
In order to rate this dataset you need to be logged onLogin / Register