Language-integrated provenance in Haskell

03/27/2018
by   Jan Stolarek, et al.
0

Scientific progress increasingly depends on data management, particularly to clean and curate data so that it can be systematically analyzed and reused. A wealth of techniques for managing and curating data (and its provenance) have been proposed, largely in the database community. In particular, a number of influential papers have proposed collecting provenance information explaining where a piece of data was copied from, or what other records were used to derive it. Most of these techniques, however, exist only as research prototypes and are not available in mainstream database systems. This means scientists must either implement such techniques themselves or (all too often) go without. This is essentially a code reuse problem: provenance techniques currently cannot be implemented reusably, only as ad hoc, usually unmaintained extensions to standard databases. An alternative, relatively unexplored approach is to support such techniques at a higher abstraction level, using metaprogramming or reflection techniques. Can advanced programming techniques make it easier to transfer provenance research results into practice? We build on a recent approach called language-integrated provenance, which extends language-integrated query techniques with source-to-source query translations that record provenance. In previous work, a proof of concept was developed in a research programming language called Links, which supports sophisticated Web and database programming. In this paper, we show how to adapt this approach to work in Haskell building on top of the Database-Supported Haskell (DSH) library. Even though it seemed clear in principle that Haskell's rich programming features ought to be sufficient, implementing language-integrated provenance in Haskell required overcoming a number of technical challenges due to interactions between these capabilities. Our implementation serves as a proof of concept showing how this combination of metaprogramming features can, for the first time, make data provenance facilities available to programmers as a library in a widely-used, general-purpose language. In our work we were successful in implementing forms of provenance known as where-provenance and lineage. We have tested our implementation using a simple database and query set and established that the resulting queries are executed correctly on the database. Our implementation is publicly available on GitHub. Our work makes provenance tracking available to users of DSH at little cost. Although Haskell is not widely used for scientific database development, our work suggests which languages features are necessary to support provenance as library. We also highlight how combining Haskell's advanced type programming features can lead to unexpected complications, which may motivate further research into type system expressiveness.

READ FULL TEXT

page 16

page 29

page 30

research
03/08/2020

Cross-tier web programming for curated databases: A case study

Curated databases have become important sources of information across sc...
research
05/06/2019

Language-integrated provenance by trace analysis

Language-integrated provenance builds on language-integrated query techn...
research
10/03/2021

Garbage Collection Makes Rust Easier to Use: A Randomized Controlled Trial of the Bronze Garbage Collector

Rust is a general-purpose programming language that is both type- and me...
research
10/21/2022

Language-Integrated Query for Temporal Data (Extended version)

Modern applications often manage time-varying data. Despite decades of r...
research
06/01/2021

Curating Covid-19 data in Links

Curated scientific databases play an important role in the scientific en...
research
03/18/2022

Configurable Per-Query Data Minimization for Privacy-Compliant Web APIs

The purpose of regulatory data minimization obligations is to limit pers...
research
02/03/2020

To pipeline or not to pipeline, that is the question

In designing query processing primitives, a crucial design choice is the...

Please sign up or login with your details

Forgot password? Click here to reset