ICPR 2020 Competition on HArvesting Raw Tables (ICPR-2020-CHART-UB_PMC) (ICPR2020-CHART-Info)
Dataset Information
Keywords
Chart, Real, Plots, Information Graphics, PubMedCentral
Description
This is a dataset of manually annotated chart images extracted from the PubMed Central Open Access section. In particular, we have only selected images that have been originally published under a Creative Commons license, and we acknowledge the source of each image by giving the original PMC ids on each file name.
There are 15 types of charts: Area, Heatmap, Horizontal Bar, Horizontal Interval, Line, Manhattan, Map, Pie, Scatter, Scatter-Line, Surface, Venn, Vertical Bar, Vertical Box, Vertical Interval. Out of these, some of them were further labeled for advanced recognition tasks (Hor. Bar, Ver. Bar, Line, Scatter, Hor. Box and Ver. Box).
There are several tasks associated with this dataset including:
1) Chart Classification
2) Text Detection and Recognition
3) Text Role Classification
4) Axis Analysis
5) Legend Analysis
6) Plot Element Detection and Recognition
7) End-to-End Data Extraction
Both training and testing datasets are included in this release. For more information, please visit https://chartinfo.github.io/ or contact: kxd7282@rit.edu
Technical Details
Annotations have been provided in two formats: JSON and XML. The JSON format is consistent with the synthetic training and testing datasets used for the competition. The XML format is the extended annotation used by our own chart annotation tools which can be used to further refine annotations, which can be found at: https://github.com/kdavila/ChartInfo_annotation_tools.
Training dataset: 15,636 images with their corresponding JSON and XML annotations.
Testing datasets: 7,287 images with their corresponding JSON and XML annotations.
For the testing set, the Ground Truth available per image depends on the split where the image is found. There are a total of 5 splits where each one of them is used to evaluate different tasks as follows:
File | Type | Size | Downloads | Description |
---|---|---|---|---|
release_ICPR2020_CHARTINFO_UB_PMC_TRAIN_v1.21.zip | data | (708 MB) | 171 | ICPR 2020 - CHART-Infographics - UB PMC Training Dataset |
release_ICPR2020_CHARTINFO_UB_PMC_TEST_v1.0.zip | data | (665 MB) | 122 | ICPR 2020 - CHART Infographics - UB PMC Testing Dataset |