Synchromedia Multispectral Ancient Document Images Dataset (SMADI)

2018-08-05 (v. 1)

Contact author

Abderrahmane Rahiche

Synchromedia lab, ETS, Quebec University

arahiche@yahoo.com

+1 (514) 396-8972

+1 (514) 396-8595

You can cite this dataset as: Abderrahmane Rahiche, Synchromedia Multispectral Ancient Document Images Dataset (SMADI) ,1,ID:SMADI_1,URL:https://tc11.cvc.uab.es/datasets/SMADI_1

Dataset Information

Keywords

Ancient Documents, Multispectral images

Description

Spectral analysis of writing materials is of great importance for the study and analysis of ancient documents. Multi-spectral (MS) imaging represents an innovative and non-destructive technique for the analysis of materials. For this purpose, we collected a Multispectral image database of ancient handwritten letters.

 

Multispectral images of an old manuscript.

 

The SMADI database consists of 240 multispectral images of 30 real historical handwritten letters. All ancient manuscripts were written using iron gall ink and date from the 17th to the 20th century. Original documents were borrowed from Quebec’s national library (BAnQ Bibliotheque et Archives nationales du Quebec) and have been imaged using a CROMA CX MSI camera, producing 8 images for each document. A total of 240 images of real documents were captured, calibrated (illumination) and registrated.

Technical Details

This dataset is organised as follows:

- The database is divided into two main folders : MSI and GT.
- The MSI folder contains 30 sub-folders. Each subfolder has an ID composed of the letter "z" and a numerical number (eg. "z97"). Each folder contains 8 MS images of one ancient document, see table bellow for more details. All MS images are already calibrated for illumination and registrated.
- The GT folder is contains the corresponding 30 ground-truth images saved in binary form (1 for background and 0 for text). The ID of each image starts with the letter "z" folwed by a numerical number and ends by the letters "GT" (e.g. "z97GT")
 
-The writting dates for each document are given in the "age.xlsx" file :

The eight spectral bands are named as follows:

    Image name    Wavelength(nm)    Light filter
        -----------------------------------------
        F1s.png        340          UV
        F2s.png     500          Visible 1 (Blue)
        F3s.png     600          Visible 2 (Green)
        F4s.png     700          Visible 3 (Red)
        F5s.png     800          IR 1
        F6s.png     900          IR 2
        F7s.png     1000         IR 3
        F8s.png     1100         IR 4
    -----------------------------------------

 

 

===========================================================
 SMADI: Synchromedia Multispectral Ancient Document Images Dataset
===========================================================

This database were collected by Rachid Hedjam and Mohamed Cheriet.2013/2014

Email:  mohamed.cheriet@etsmtl.ca

Synchromedia Laboratory, ETS, École de technologie supérieure, University of Quebec.

 

The SMADI database is freely available for non-commercial research purposes and publicly accessible. Other use requires written permission. If you are publishing scientific work based on the SMADI, you are requested to cite the following papers:

 

[1] Hedjam, R., & Cheriet, M. (2013). Historical document image restoration using multispectral imaging system. Pattern Recognition, 46(8), 2297-2312.

[2] Hedjam, Rachid, and Mohamed Cheriet. "Ground-truth estimation in multispectral representation space: Application to degraded document image binarization." Document Analysis and Recognition (ICDAR), 2013 12th International Conference on. IEEE, 2013.

 

FileTypeSizeDownloadsDescription
MSI-dataset.zipdata(79 MB)70Dataset
Ground-truth estimation in multispectral representation space: Application to degraded document image binarization.pdfarticle(168 KB)27ICDAR Conference paper
Historical document image restoration using multispectral imaging system.pdfarticle(2 MB)19Pattern recognition journal paper
dates.pdfother(12 KB)20dates

References

[1] R. Hedjam, M. Cheriet Historical document image restoration using multispectral imaging system Hedjam, R., & Cheriet, M. (2013). Historical document image restoration using multispectral imaging system. Pattern Recognition, 46(8), 2297-2312. (PDF)

[2] R. Hedjam, M. CHeriet Ground-truth estimation in multispectral representation space: Application to degraded document image binarization Hedjam, Rachid, and Mohamed Cheriet. "Ground-truth estimation in multispectral representation space: Application to degraded document image binarization." Document Analysis and Recognition (ICDAR), 2013 12th International Conference on. IEEE, 2013. (PDF)

Comments
No comments on this dataset yet.
Valoration
In order to rate this dataset you need to be logged onLogin / Register