ICDAR 2013 - Gender Identification Competition Dataset (GenderIdentifify2013)

2015-01-25 (v. 1)

Contact author

Abdelaali Hassaine

Qatar University



You can cite this dataset as: Abdelaali Hassaine, ICDAR 2013 - Gender Identification Competition Dataset (GenderIdentifify2013) ,1,ID:GenderIdentifify2013_1,URL:https://tc11.cvc.uab.es/datasets/GenderIdentifify2013_1

Dataset Information


Gender Identification, Writer Identification


This is the dataset of the ICDAR 2013 - Gender Identification from Handwriting competition. If you use this database, please consider citing it as in [1].

This dataset is a subset of the QUWI dataset [2]. In sum, a total of 475 writers produced 4 handwritten documents: the first page contains an Arabic handwritten text which varies from one writer to another, the second page contains an Arabic handwritten text which is the same for all the writers, the third page contains an English handwritten text which varies from one writer to another and the fourth page contains an English handwritten text which is the same for all the writers.

Images have been acquired using an EPSON GT-S80 scanner, with a 300 DPI resolution. Images were provided in
JPG uncompressed format. The training set consists of the first 282 writers for which the genders are provided.
Participants were asked to predict the gender of the remaining 193 writers.

In addition to images, features extracted from the data were also provided in order to make the competition accessible by people without image processing skills. Those features are described in [3].

[1] Hassaïne, Abdelâali, et al. "ICDAR 2013 Competition on Gender Prediction from Handwriting." Document Analysis and Recognition (ICDAR), 2013 12th International Conference on. IEEE, 2013.

[2] Hassaïne, Abdelaali, et al. "The ICDAR2011 Arabic writer identification contest." Document Analysis and Recognition (ICDAR), 2011 International Conference on. IEEE, 2011.

[3] Al Maadeed, Somaya, and Abdelaali Hassaine. "Automatic prediction of age, gender, and nationality in offline handwriting." EURASIP Journal on Image and Video Processing 2014.1 (2014): 1-10.


Technical Details

Images are provided in zip files, for convinience, they are splitted into groups of 50 writers.

The images are named XXXX_Y.jpg where XXXX is the ID of the writer and Y is the ID of the document.

The features of the training and the test set are in train.csv and test.csv. They are provided as zipped archives.

train.csv and test.csv contain the following columns:

  • writer: the ID of the writer
  • page_id: from 1 to 4
  • language: Arabic or English
  • same_text: whether or not the text for this page is the same for all writers (same_text=1 for page_ids 2 and 4)
  • The remaining columns are features

icdar2013_gender.pdfarticle(378 KB)120ICDAR2013 - Competition on Gender Prediction from Handwriting
1_50.zipdata(230 MB)121
101_150.zipdata(238 MB)74
151_200.zipdata(261 MB)58
201_250.zipdata(196 MB)61
251_300.zipdata(191 MB)59
301_350.zipdata(231 MB)58
351_400.zipdata(237 MB)60
401_450.zipdata(229 MB)60
451_475.zipdata(99 MB)72
51_100.zipdata(237 MB)63
train.zipdata(10 MB)116Features of the training set
test.zipdata(7 MB)82Features of the test set


No comments on this dataset yet.

Add your comment

In order to comment on a dataset you need to be logged on
Register Now!


In order to rate this dataset you need to be logged on
Register Now!