Dataset search: a survey

by   Adriane Chapman, et al.

Generating value from data requires the ability to find, access and make sense of datasets. There are many efforts underway to encourage data sharing and reuse, from scientific publishers asking authors to submit data alongside manuscripts to data marketplaces, open data portals and data communities. Google recently beta released a search service for datasets, which allows users to discover data stored in various online repositories via keyword queries. These developments foreshadow an emerging research field around dataset search or retrieval that broadly encompasses frameworks, methods and tools that help match a user data need against a collection of datasets. Here, we survey the state of the art of research and commercial systems in dataset retrieval. We identify what makes dataset search a research field in its own right, with unique challenges and methods and highlight open problems. We look at approaches and implementations from related areas dataset search is drawing upon, including information retrieval, databases, entity-centric and tabular search in order to identify possible paths to resolve these open problems as well as immediate next steps that will take the field forward.


page 1

page 2

page 3

page 4


DataFinder: Scientific Dataset Recommendation from Natural Language Descriptions

Modern machine learning relies on datasets to develop and validate resea...

DataChat: Prototyping a Conversational Agent for Dataset Search and Visualization

Data users need relevant context and research expertise to effectively s...

Ribonucleic acid (RNA) virus and coronavirus in Google Dataset Search: their scope and epidemiological correlation

This paper presents an analysis of the publication of datasets collected...

Searching Data: A Review of Observational Data Retrieval Practices

A cross-disciplinary examination of the user behaviours involved in seek...

SaL-Lightning Dataset: Search and Eye Gaze Behavior, Resource Interactions and Knowledge Gain during Web Search

The emerging research field Search as Learning investigates how the Web ...

A Survey of Diversification Techniques in Search and Recommendation

Diversifying search results is an important research topic in retrieval ...

Retrieve Synonymous keywords for Frequent Queries in Sponsored Search in a Data Augmentation Way

In sponsored search, retrieving synonymous keywords is of great importan...

Please sign up or login with your details

Forgot password? Click here to reset