Competition on HArvesting Raw Tables (CHART) 2019 - PubMedCentral (ICDAR-CHART-2019-PMC)

2019-06-12 (v. 1)

Contact author

Kenny Davila

University at Buffalo


You can cite this dataset as: Kenny Davila, Competition on HArvesting Raw Tables (CHART) 2019 - PubMedCentral (ICDAR-CHART-2019-PMC) ,1,ID:ICDAR-CHART-2019-PMC_1,URL:

Dataset Information


Chart, Real, Plots, Information Graphics, PubMedCentral


This is a dataset of manually annotated chart images extracted from the Pub Med Central Open Access set.  There are 4 basic types of charts: Bar, Line, Scatter, Box.  There are several tasks associated with this dataset including:

1) Chart Classification

2) Text Detection and Recognition

3) Text Role Classification

4) Axis Analysis

5) Legend Analysis

A total of 4242 images have been annotated for Task 1, 200 for Task 2 and 200 for Tasks 3 to 5. For more information, please visit or contact:

Technical Details

The dataset release consist of two indices in CSV format which include the list of Publications that must be downloaded from the Pub Med Central and the list of images that have been used for each task from each of these publications. Along with these indices, we include python scripts that can be used to download and uncompress the required images. Then, we include the ground truth annotations for each image for each task, both in their original XML format as used by the annotation tools, as well as the JSON format used by the evaluation tools of the original ICDAR CHART 2019 Competition. 

ICDAR_CHART2019_PMC_Test_Dataset_v1.0.zipdata(4 MB)125ICDAR CHART 2019 - Pub Med Central Test Set v 1.0
No comments on this dataset yet.
In order to rate this dataset you need to be logged onLogin / Register