ICDAR 2013 - Gender Identification Competition Dataset (GenderIdentifify2013)

2015-01-25 (v. 1)

Contact author

Abdelaali Hassaine

Qatar University



You can cite this dataset as: Abdelaali Hassaine, ICDAR 2013 - Gender Identification Competition Dataset (GenderIdentifify2013) ,1,ID:GenderIdentifify2013_1,URL:https://tc11.cvc.uab.es/datasets/GenderIdentifify2013_1

Dataset Information


Gender Identification, Writer Identification


This is the dataset of the ICDAR 2013 - Gender Identification from Handwriting competition. If you use this database, please consider citing it as in [1].

This dataset is a subset of the QUWI dataset [2]. In sum, a total of 475 writers produced 4 handwritten documents: the first page contains an Arabic handwritten text which varies from one writer to another, the second page contains an Arabic handwritten text which is the same for all the writers, the third page contains an English handwritten text which varies from one writer to another and the fourth page contains an English handwritten text which is the same for all the writers.

Images have been acquired using an EPSON GT-S80 scanner, with a 300 DPI resolution. Images were provided in
JPG uncompressed format. The training set consists of the first 282 writers for which the genders are provided.
Participants were asked to predict the gender of the remaining 193 writers.

In addition to images, features extracted from the data were also provided in order to make the competition accessible by people without image processing skills. Those features are described in [3].

[1] Hassaïne, Abdelâali, et al. "ICDAR 2013 Competition on Gender Prediction from Handwriting." Document Analysis and Recognition (ICDAR), 2013 12th International Conference on. IEEE, 2013.

[2] Hassaïne, Abdelaali, et al. "The ICDAR2011 Arabic writer identification contest." Document Analysis and Recognition (ICDAR), 2011 International Conference on. IEEE, 2011.

[3] Al Maadeed, Somaya, and Abdelaali Hassaine. "Automatic prediction of age, gender, and nationality in offline handwriting." EURASIP Journal on Image and Video Processing 2014.1 (2014): 1-10.


Technical Details

Images are provided in zip files, for convinience, they are splitted into groups of 50 writers.

The images are named XXXX_Y.jpg where XXXX is the ID of the writer and Y is the ID of the document.

The features of the training and the test set are in train.csv and test.csv. They are provided as zipped archives.

train.csv and test.csv contain the following columns:

  • writer: the ID of the writer
  • page_id: from 1 to 4
  • language: Arabic or English
  • same_text: whether or not the text for this page is the same for all writers (same_text=1 for page_ids 2 and 4)
  • The remaining columns are features

icdar2013_gender.pdfarticle(378 KB)154ICDAR2013 - Competition on Gender Prediction from Handwriting
1_50.zipdata(230 MB)166
101_150.zipdata(238 MB)89
151_200.zipdata(261 MB)72
201_250.zipdata(196 MB)75
251_300.zipdata(191 MB)72
301_350.zipdata(231 MB)72
351_400.zipdata(237 MB)71
401_450.zipdata(229 MB)72
451_475.zipdata(99 MB)84
51_100.zipdata(237 MB)76
train.zipdata(10 MB)149Features of the training set
test.zipdata(7 MB)102Features of the test set
anguelos 12-23-2023 03:21
How do we get the gender of each writer?
Aditya Majithia 01-05-2024 00:24
I have the same question. Where can we get the gender associated with each writer?
In order to rate this dataset you need to be logged onLogin / Register