Assessing Achievability of Queries and Constraints

by   Rada Chirkova, et al.

Assessing and improving the quality of data in data-intensive systems are fundamental challenges that have given rise to numerous applications targeting transformation and cleaning of data. However, while schema design, data cleaning, and data migration are nowadays reasonably well understood in isolation, not much attention has been given to the interplay between the tools that address issues in these areas. Our focus is on the problem of determining whether there exist sequences of data-transforming procedures that, when applied to the (untransformed) input data, would yield data satisfying the conditions required for performing the task in question. Our goal is to develop a framework that would address this problem, starting with the relational setting. In this paper we abstract data-processing tools as black-box procedures. This abstraction describes procedures by a specification of which parts of the database might be modified by the procedure, as well as by the constraints that specify the required states of the database before and after applying the procedure. We then proceed to study fundamental algorithmic questions arising in this context, such as understanding when one can guarantee that sequences of procedures apply to original or transformed data, when they succeed at improving the data, and when knowledge bases can represent the outcomes of procedures. Finally, we turn to the problem of determining whether the application of a sequence of procedures to a database results in the satisfaction of properties specified by either queries or constraints. We show that this problem is decidable for some broad and realistic classes of procedures and properties, even when procedures are allowed to alter the schema of instances.


page 1

page 2

page 3

page 4


A Formal Category Theoretical Framework for Multi-model Data Transformations

Data integration and migration processes in polystores and multi-model d...

Right-Adjoints for Datalog Programs, and Homomorphism Dualities over Restricted Classes

A Datalog program can be viewed as a syntactic specification of a functo...

Graph Based Proactive Secure Decomposition Algorithm for Context Dependent Attribute Based Inference Control Problem

Relational DBMSs continue to dominate the database market, and inference...

A Transfer-Learnable Natural Language Interface for Databases

Relational database management systems (RDBMSs) are powerful because the...

All-Instances Restricted Chase Termination: The Guarded Case

The chase procedure is a fundamental algorithmic tool in database theory...

MREC: a fast and versatile framework for aligning and matching data with applications to single cell molecular data

Comparing and aligning large datasets is a pervasive problem occurring a...

Please sign up or login with your details

Forgot password? Click here to reset