Synthetic dataset of ID and Travel Document (SIDTD)

2023-12-18 (v. 1)

Contact author

Carlos Boned Riera

Computer Vision Center

cboned@cvc.uab.cat

+34 627044012

You can cite this dataset as: Carlos Boned Riera, Synthetic dataset of ID and Travel Document (SIDTD) ,1,ID:SIDTD_1,URL:https://tc11.cvc.uab.es/datasets/SIDTD_1

Dataset Information

Keywords

Dataset, Forgery detection, ID Documents verification

Description

the SIDTD dataset is an extension of the MIDV2020 dataset. Initially, the MIDV2020 dataset is composed of forged ID documents, as all documents are generated by means of AI techniques. These generated documents are considered in the SIDTD dataset as representative of bona fide. On the other hand, the documents generated are considered as being forged versions of them. The corpus of the dataset is composed by ten European nationalities that are equally represented: Albanian, Azerbaijani, Estonian, Finnish, Greek, Lithuanian, Russian, Serbian, Slovakian, and Spanish.

 

We employ two techniques for generating composite PAIs: Crop & Replace and inpainting. 

Datase contains videos, and clips, of captured ID Documents with different backgrounds, we add the same type of data for the forged ID Document images generated using the techniques described. The protocol employed to generate the dataset is as follows: We printed 191 counterfeit ID documents on paper using an HP Color LaserJet E65050 printer. Then, the documents were laminated with 100-micron-thick laminating pouches to enhance realism and manually cropped.

CVC’s employees were requested to use their smartphones to record videos of forged ID documents from SIDTD. This approach aimed to capture a diverse range of video qualities, backgrounds, durations, and light intensities

 

Downlad links

Dataset

Defined Splits

 

 

Technical Details

For each generated document, a JSON file with information related to the document generation process, has been created. For MIDV2020 instances, we keep the JSON files structure to preserve MIDV2020 dataset consistency in the SIDTD dataset.

 

Real images metadata

 

 

 

 

 

For the fake Travel and ID Documents we added the following information:

Forgery Images metadata

 

 

 

 

 

 

Finally, for the extracted clips from the recorded videos of fake travel and ID Documents, we added to the metadata information the bounding box coordinates of the document location within the counterfeit clip. We follow the same bounding box representation used for the original MIDV2020 dataset.

 

Foreign Clips metadata

 

 

 

 

 

 

Also, there will be files with a defined data partition to recreate the results that are shown in the paper. These files contain the path of the images, and the ground truth (1 or 0) depends if the image is fake or not.

Comments
No comments on this dataset yet.
Valoration
In order to rate this dataset you need to be logged onLogin / Register