ICDAR 2013 - Gender Identification Competition Dataset (GenderIdentifify2013)
Dataset Information
Keywords
Gender Identification, Writer Identification
Description
This is the dataset of the ICDAR 2013 - Gender Identification from Handwriting competition. If you use this database, please consider citing it as in [1].
This dataset is a subset of the QUWI dataset [2]. In sum, a total of 475 writers produced 4 handwritten documents: the first page contains an Arabic handwritten text which varies from one writer to another, the second page contains an Arabic handwritten text which is the same for all the writers, the third page contains an English handwritten text which varies from one writer to another and the fourth page contains an English handwritten text which is the same for all the writers.
Images have been acquired using an EPSON GT-S80 scanner, with a 300 DPI resolution. Images were provided in
JPG uncompressed format. The training set consists of the first 282 writers for which the genders are provided.
Participants were asked to predict the gender of the remaining 193 writers.
In addition to images, features extracted from the data were also provided in order to make the competition accessible by people without image processing skills. Those features are described in [3].
[1] Hassaïne, Abdelâali, et al. "ICDAR 2013 Competition on Gender Prediction from Handwriting." Document Analysis and Recognition (ICDAR), 2013 12th International Conference on. IEEE, 2013.
[2] Hassaïne, Abdelaali, et al. "The ICDAR2011 Arabic writer identification contest." Document Analysis and Recognition (ICDAR), 2011 International Conference on. IEEE, 2011.
[3] Al Maadeed, Somaya, and Abdelaali Hassaine. "Automatic prediction of age, gender, and nationality in offline handwriting." EURASIP Journal on Image and Video Processing 2014.1 (2014): 1-10.
Technical Details
Images are provided in zip files, for convinience, they are splitted into groups of 50 writers.
The images are named XXXX_Y.jpg where XXXX is the ID of the writer and Y is the ID of the document.
The features of the training and the test set are in train.csv and test.csv. They are provided as zipped archives.
train.csv and test.csv contain the following columns:
- writer: the ID of the writer
- page_id: from 1 to 4
- language: Arabic or English
- same_text: whether or not the text for this page is the same for all writers (same_text=1 for page_ids 2 and 4)
- The remaining columns are features
File | Type | Size | Downloads | Description |
---|---|---|---|---|
icdar2013_gender.pdf | article | (378 KB) | 155 | ICDAR2013 - Competition on Gender Prediction from Handwriting |
1_50.zip | data | (230 MB) | 167 | |
101_150.zip | data | (238 MB) | 89 | |
151_200.zip | data | (261 MB) | 72 | |
201_250.zip | data | (196 MB) | 75 | |
251_300.zip | data | (191 MB) | 72 | |
301_350.zip | data | (231 MB) | 72 | |
351_400.zip | data | (237 MB) | 71 | |
401_450.zip | data | (229 MB) | 72 | |
451_475.zip | data | (99 MB) | 84 | |
51_100.zip | data | (237 MB) | 76 | |
train.zip | data | (10 MB) | 149 | Features of the training set |
test.zip | data | (7 MB) | 103 | Features of the test set |