Competition on HArvesting Raw Tables (CHART) 2019 - PubMedCentral (ICDAR-CHART-2019-PMC)

2019-06-12 (v. 1)

Contact author

Kenny Davila

University at Buffalo

kxd7282@rit.edu

4845533582

You can cite this dataset as: Kenny Davila, Competition on HArvesting Raw Tables (CHART) 2019 - PubMedCentral (ICDAR-CHART-2019-PMC) ,1,ID:ICDAR-CHART-2019-PMC_1,URL:https://tc11.cvc.uab.es/datasets/ICDAR-CHART-2019-PMC_1

Dataset Information

Keywords

Chart, Real, Plots, Information Graphics, PubMedCentral

Description

This is a dataset of manually annotated chart images extracted from the Pub Med Central Open Access set.  There are 4 basic types of charts: Bar, Line, Scatter, Box.  There are several tasks associated with this dataset including:

1) Chart Classification

2) Text Detection and Recognition

3) Text Role Classification

4) Axis Analysis

5) Legend Analysis

A total of 4242 images have been annotated for Task 1, 200 for Task 2 and 200 for Tasks 3 to 5. For more information, please visit https://chartinfo.github.io/ or contact: kennydav@buffalo.edu

Technical Details

The dataset release consist of two indices in CSV format which include the list of Publications that must be downloaded from the Pub Med Central and the list of images that have been used for each task from each of these publications. Along with these indices, we include python scripts that can be used to download and uncompress the required images. Then, we include the ground truth annotations for each image for each task, both in their original XML format as used by the annotation tools, as well as the JSON format used by the evaluation tools of the original ICDAR CHART 2019 Competition. 

FileTypeSizeDownloadsDescription
ICDAR_CHART2019_PMC_Test_Dataset_v1.0.zipdata(4 MB)126ICDAR CHART 2019 - Pub Med Central Test Set v 1.0
Comments
No comments on this dataset yet.
Valoration
In order to rate this dataset you need to be logged onLogin / Register