Handwritings Datasets for Urdu & English (IPC-WritDAR)

2024-03-20 (v. 1)

Contact author

Syed Ehsan Raza Ali Hamdani

Document Analysis Unit (DAU), Islamabad, Pakistan

ahsen988@gmail.com

923345808719

You can cite this dataset as: Syed Ehsan Raza Ali Hamdani, Handwritings Datasets for Urdu & English (IPC-WritDAR) ,1,ID:IPC-WritDAR_1,URL:https://tc11.cvc.uab.es/datasets/IPC-WritDAR_1

Dataset Information

Dataset URL

https://sites.google.com/site/artificialtextdataset/handwritten-datasets-urduenglish

Keywords

Handwriting, Urdu, English, Writer Identification, Biometrics Identification, Document Analysis

Description

Introduction:

This dataset presents handwritten samples in Urdu and English, designed to aid researchers in handwriting analysis, particularly in writer identification/verification and biometric identification. The aim is to provide a common benchmark for researchers to experiment with and evaluate their algorithms and techniques. This database comprises 176 scanned handwritten samples from 44 writers, with each writer contributing four samples—two in English and two in Urdu. Thus, there are a total of 88 samples for each script, scanned in ".png" format at 300dpi resolution. The content for these datasets is sourced from various authentic online media.

Structure & Ground Truth:

The structure of the handwritten samples follows the layout of the IAM sample, chosen for its simplicity and coherence. Each sample document is divided into four parts:

  1. The top section contains a Unique Form ID.
  2. The second part includes printed text for the writer to transcribe.
  3. The third part is for the writer to reproduce the printed text in their natural handwriting.
  4. The final part provides an optional space for the writer to write their name.

Ground truth information is provided in the form of a Unique Form/Document name (image name) for writer identification and verification. Each image is assigned a unique name according to below convention:

[One Letter Category Code][Two Digit Form Code/Form count]-[Three Digit Writer ID][Letter U representing Urdu]

To access the handwritten samples (four samples) of a particular writer, researchers can search for the same three-digit writer ID across the entire database.

Handwritten sample for English Dataset

 

 

 

 

 

 

 

 

 

 

 

 

 

Researchers are encouraged to cite the associated research study below when utilizing this dataset in their own research endeavors:

A. Raza, I. Siddiqi, A. Abidi, and F. Arif, "An Unconstrained Benchmark Urdu Handwritten Sentence Database with Automatic Line Segmentation," 2012 International Conference on Frontiers in Handwriting Recognition, Bari, Italy, 2012, pp. 491-496, doi: 10.1109/ICFHR.2012.177.

FileTypeSizeDownloadsDescription
Database with GT.rardata(122 MB)5Handwriting Dataset
An_Unconstrained_Benchmark_Urdu_Handwritten_Sentence_Database_with_Automatic_Line_Segmentation.pdfarticle(1 MB)5Associated & Published Research Study for the Dataset.
Comments
No comments on this dataset yet.
Valoration
In order to rate this dataset you need to be logged onLogin / Register