Film Videos Dataset (FiViD)
Movie, Subtitles, Text, graphical
This dataset contains images with and without text extracted from integrated arabic subtitles in movies.
The target task is classification of text versus non-text images.
The dataset is composed of 4 archives/folders:
- a train set for text images: 25496 items
- a train set for non-text images: 34760 items
- a test set for text images: 5805
- a test set for non-text images: 8052
Images are all in 200 * 40 pixels format, and are extracted from lower parts of image that contain the film subtitle.
All images are in TIF format.
Below are two samples of original images the dataset images were extracted from, and a sample dataset image.
Non-text original image
Text original image
Sample dataset image (with subtitle text)
|Non_Text Training Dataset.7z||data||(79 MB)||10||200 * 40 - pixel non-text Training images|
|Text Training Dataset.7z||data||(58 MB)||12||200 * 40 - pixel text Training images|
|Non_Text Testing Dataset.7z||data||(22 MB)||7||200 * 40 - pixel non-text Testing images|
|Text Testing Dataset.7z||data||(13 MB)||9||200 * 40 - pixel text Training images|