Urdu Artificial Text Dataset (IPC-ArtifDAR)

2024-03-20 (v. 1)

Contact author

Syed Ehsan Raza Ali Hamdani

Document Analysis Unit (DAU), Islamabad, Pakistan.

ahsen988@gmail.com

923345808719

You can cite this dataset as: Syed Ehsan Raza Ali Hamdani, Urdu Artificial Text Dataset (IPC-ArtifDAR) ,1,ID:IPC-ArtifDAR_1,URL:https://tc11.cvc.uab.es/datasets/IPC-ArtifDAR_1

Dataset Information

Dataset URL

https://sites.google.com/site/artificialtextdataset/document-analysis-dataset-urdu-script

Keywords

Urdu, Caption Text Detection, Artificial Text, Text Localization

Description

Data Set Statistics

The dataset comprises images extracted from 19 distinct Urdu news channel videos, alongside a small collection from non-news channels.

  • All images are stored in PNG format.
  • Image names follow a conventional digit format (e.g., 01, 02).

Ground Truth Labeling

The ground truth data folder is located within each original image folder under the name gt_rect.

The gt_rect folder contains .dat files (one .dat file per image, which can be opened in WordPad). These files provide the dimensions of rectangles drawn around the artificial textual contents present in the image.

The image name and the corresponding ground truth data file (.dat file, one for each image) share the same name.

The format of the rectangle dimensions is as follows:

Figure 1:A sample .dat file

 

 

 

 

 

 

 

 

 

  • Each row here in image corresponds to the given dimensions of the rectangle.

  • X=x Coordinate of the drawn rectangle.

  • Y=y Coordinate of the drawn rectangle.

  • Width and Height corresponds to actual width and height of the rectangle.

CITATION

When referencing this dataset, please cite it as:

I. Siddiqi and A. Raza, “A Database of Artificial Urdu Text in Video Images with Semi-Automatic Text Line Labeling Scheme”, In Proceedings of the 4th International Conference on Advances in Multimedia, MMEDIA 2012, France, pp. 75-81.

FileTypeSizeDownloadsDescription
mmedia_2012_4_20_40133.pdfarticle(1012 KB)0Associated & Published Research Study for the Dataset
IPC-Artificial Text Data Set 1.0(Urdu Script)_Part 1(1).rardata(41 MB)0Dataset - Part1 of 1
IPC-Artificial Text Data Set 1.0(Urdu Script)_Part 1(2).rardata(35 MB)0Dataset - Part 2 of 1
IPC-Artificial Text Data Set 1.0(Urdu Script)_Part 3.rardata(60 MB)0Dataset - Part 3
IPC-Artificial Text Data Set 1.0(Urdu Script)_Part 4(1).rardata(63 MB)0Dataset - Part 1 of 4
IPC-Artificial Text Data Set 1.0(Urdu Script)_Part 4(3).rardata(62 MB)0Dataset - Par 3 of 4
IPC-Artificial Text Data Set 1.0(Urdu Script)_Part 4(4).rardata(61 MB)0Dataset - Part 4 of 4
IPC-Artificial Text Data Set 1.0(Urdu Script)_Part 2.rardata(85 MB)0Dataset - Part 2
IPC-Artificial Text Data Set 1.0(Urdu Script)_Part 4(2).rardata(66 MB)0Dataset - Part 2 of 4
Comments
No comments on this dataset yet.
Valoration
In order to rate this dataset you need to be logged onLogin / Register