Extracting data from vector figures in scholarly articles

09/06/2017
by   Chris Hartgerink, et al.
0

It is common for authors to communicate their results in graphical figures, but those data are frequently unavailable for reanalysis. Reconstructing data points from a figure manually requires the author to measure the coordinates either on printed pages using a ruler, or from the display screen using a cursor. This is time-consuming (often hours) and error-prone, and limited by the precision of the display or ruler. What is often not realised is that the data themselves are held in the PDF document to much higher precision (usually 0.0-0.01 pixels), if the figure is stored in vector format. We developed alpha software to automatically reconstruct data from vector figures and tested it on funnel plots in the meta-analysis literature. Our results indicate that reconstructing data from vector based figures is promising, where we correctly extracted data for 12 out of 24 funnel plots with extracted data (50 However, we observed that vector based figures are relatively sparse (15 out of 136 papers with funnel plots) and strongly insist publishers to provide more vector based data figures in the near future for the benefit of the scholarly community.

READ FULL TEXT
research
09/09/2022

Pitfalls and Guidelines for Using Time-Based Git Data

Many software engineering research papers rely on time-based data (e.g.,...
research
12/15/2019

NaïveRole: Author-Contribution Extraction and Parsing from Biomedical Manuscripts

Information about the contributions of individual authors to scientific ...
research
03/21/2021

Escaping the Time Pit: Pitfalls and Guidelines for Using Time-Based Git Data

Many software engineering research papers rely on time-based data (e.g.,...
research
04/05/2016

Learning to Generate Posters of Scientific Papers

Researchers often summarize their work in the form of posters. Posters p...
research
03/15/2019

Availability of Hyperlinked Resources in Astrophysics Papers

Astrophysics papers often rely on software which may or may not be avail...
research
04/18/2008

Size matters: performance declines if your pixels are too big or too small

We present a conceptual model that describes the effect of pixel size on...
research
02/21/2017

Learning to Generate Posters of Scientific Papers by Probabilistic Graphical Models

Researchers often summarize their work in the form of scientific posters...

Please sign up or login with your details

Forgot password? Click here to reset