Competition on HArvesting Raw Tables (CHART) 2019 - PubMedCentral (ICDAR-CHART-2019-PMC)
Dataset Information
Keywords
Chart, Real, Plots, Information Graphics, PubMedCentral
Description
This is a dataset of manually annotated chart images extracted from the Pub Med Central Open Access set. There are 4 basic types of charts: Bar, Line, Scatter, Box. There are several tasks associated with this dataset including:
1) Chart Classification
2) Text Detection and Recognition
3) Text Role Classification
4) Axis Analysis
5) Legend Analysis
A total of 4242 images have been annotated for Task 1, 200 for Task 2 and 200 for Tasks 3 to 5. For more information, please visit https://chartinfo.github.io/ or contact: kennydav@buffalo.edu
Technical Details
The dataset release consist of two indices in CSV format which include the list of Publications that must be downloaded from the Pub Med Central and the list of images that have been used for each task from each of these publications. Along with these indices, we include python scripts that can be used to download and uncompress the required images. Then, we include the ground truth annotations for each image for each task, both in their original XML format as used by the annotation tools, as well as the JSON format used by the evaluation tools of the original ICDAR CHART 2019 Competition.
File | Type | Size | Downloads | Description |
---|---|---|---|---|
ICDAR_CHART2019_PMC_Test_Dataset_v1.0.zip | data | (4 MB) | 126 | ICDAR CHART 2019 - Pub Med Central Test Set v 1.0 |