Urdu Artificial Text Dataset (IPC-ArtifDAR)
Dataset Information
Dataset URL
https://sites.google.com/site/artificialtextdataset/document-analysis-dataset-urdu-script
Keywords
Urdu, Caption Text Detection, Artificial Text, Text Localization
Description
Data Set Statistics
The dataset comprises images extracted from 19 distinct Urdu news channel videos, alongside a small collection from non-news channels.
- All images are stored in PNG format.
- Image names follow a conventional digit format (e.g., 01, 02).
Ground Truth Labeling
The ground truth data folder is located within each original image folder under the name gt_rect.
The gt_rect folder contains .dat files (one .dat file per image, which can be opened in WordPad). These files provide the dimensions of rectangles drawn around the artificial textual contents present in the image.
The image name and the corresponding ground truth data file (.dat file, one for each image) share the same name.
The format of the rectangle dimensions is as follows:
Figure 1:A sample .dat file
-
Each row here in image corresponds to the given dimensions of the rectangle.
-
X=x Coordinate of the drawn rectangle.
-
Y=y Coordinate of the drawn rectangle.
-
Width and Height corresponds to actual width and height of the rectangle.
CITATION
When referencing this dataset, please cite it as:
I. Siddiqi and A. Raza, “A Database of Artificial Urdu Text in Video Images with Semi-Automatic Text Line Labeling Scheme”, In Proceedings of the 4th International Conference on Advances in Multimedia, MMEDIA 2012, France, pp. 75-81.
File | Type | Size | Downloads | Description |
---|---|---|---|---|
mmedia_2012_4_20_40133.pdf | article | (1012 KB) | 0 | Associated & Published Research Study for the Dataset |
IPC-Artificial Text Data Set 1.0(Urdu Script)_Part 1(1).rar | data | (41 MB) | 2 | Dataset - Part1 of 1 |
IPC-Artificial Text Data Set 1.0(Urdu Script)_Part 1(2).rar | data | (35 MB) | 0 | Dataset - Part 2 of 1 |
IPC-Artificial Text Data Set 1.0(Urdu Script)_Part 3.rar | data | (60 MB) | 0 | Dataset - Part 3 |
IPC-Artificial Text Data Set 1.0(Urdu Script)_Part 4(1).rar | data | (63 MB) | 0 | Dataset - Part 1 of 4 |
IPC-Artificial Text Data Set 1.0(Urdu Script)_Part 4(3).rar | data | (62 MB) | 0 | Dataset - Par 3 of 4 |
IPC-Artificial Text Data Set 1.0(Urdu Script)_Part 4(4).rar | data | (61 MB) | 0 | Dataset - Part 4 of 4 |
IPC-Artificial Text Data Set 1.0(Urdu Script)_Part 2.rar | data | (85 MB) | 0 | Dataset - Part 2 |
IPC-Artificial Text Data Set 1.0(Urdu Script)_Part 4(2).rar | data | (66 MB) | 0 | Dataset - Part 2 of 4 |