Scaling Systematic Literature Reviews with Machine Learning Pipelines

Systematic reviews, which entail the extraction of data from large numbers of scientific documents, are an ideal avenue for the application of machine learning. They are vital to many fields of science and philanthropy, but are very time-consuming and require experts. Yet the three main stages of a systematic review are easily done automatically: searching for documents can be done via APIs and scrapers, selection of relevant documents can be done via binary classification, and extraction of data can be done via sequence-labelling classification. Despite the promise of automation for this field, little research exists that examines the various ways to automate each of these tasks. We construct a pipeline that automates each of these aspects, and experiment with many human-time vs. system quality trade-offs. We test the ability of classifiers to work well on small amounts of data and to generalise to data from countries not represented in the training data. We test different types of data extraction with varying difficulty in annotation, and five different neural architectures to do the extraction. We find that we can get surprising accuracy and generalisability of the whole pipeline system with only 2 weeks of human-expert annotation, which is only 15 do the whole review manually and can be repeated and extended to new data with no additional effort.


page 8

page 12


Improving Active Learning in Systematic Reviews

Systematic reviews are essential to summarizing the results of different...

A Novel Framework to Expedite Systematic Reviews by Automatically Building Information Extraction Training Corpora

A systematic review identifies and collates various clinical studies and...

Immigration Document Classification and Automated Response Generation

In this paper, we consider the problem of organizing supporting document...

Automating Systematic Literature Reviews with Natural Language Processing and Text Mining: a Systematic Literature Review

Objectives: An SLR is presented focusing on text mining based automation...

A Level-wise Taxonomic Perspective on Automated Machine Learning to Date and Beyond: Challenges and Opportunities

Automated machine learning (AutoML) is essentially automating the proces...

Informed Machine Learning, Centrality, CNN, Relevant Document Detection, Repatriation of Indigenous Human Remains

Among the pressing issues facing Australian and other First Nations peop...

Viability of machine learning to reduce workload in systematic review screenings in the health sciences: a working paper

Systematic reviews, which summarize and synthesize all the current resea...

Please sign up or login with your details

Forgot password? Click here to reset