Relational Algebra for In-Database Process Mining

by   Remco Dijkman, et al.

The execution logs that are used for process mining in practice are often obtained by querying an operational database and storing the result in a flat file. Consequently, the data processing power of the database system cannot be used anymore for this information, leading to constrained flexibility in the definition of mining patterns and limited execution performance in mining large logs. Enabling process mining directly on a database - instead of via intermediate storage in a flat file - therefore provides additional flexibility and efficiency. To help facilitate this ideal of in-database process mining, this paper formally defines a database operator that extracts the 'directly follows' relation from an operational database. This operator can both be used to do in-database process mining and to flexibly evaluate process mining related queries, such as: "which employee most frequently changes the 'amount' attribute of a case from one task to the next". We define the operator using the well-known relational algebra that forms the formal underpinning of relational databases. We formally prove equivalence properties of the operator that are useful for query optimization and present time-complexity properties of the operator. By doing so this paper formally defines the necessary relational algebraic elements of a 'directly follows' operator, which are required for implementation of such an operator in a DBMS.


page 1

page 2

page 3

page 4


Native Directly Follows Operator

Typical legacy information systems store data in relational databases. P...

Graph-based process mining

Process mining is an area of research that supports discovering informat...

Causal Process Mining from Relational Databases with Domain Knowledge

The plethora of algorithms in the research field of process mining build...

The Context Model: A Graph Database Model

In the relational model a relation over a set of attributes is defined t...

Proximity-based equivalence classes in fuzzy relational database model

One of the first attempts to set a solid theoretical foundation for exte...

Implicit Recursive Characteristics of STOP

The most important notations of Communicating Sequential Process(CSP) ar...

An improved method of delta summation for faster current value selection across filtered subsets of interval and temporal relational data

Aggregation in relational databases is accomplished through hashing and ...

Please sign up or login with your details

Forgot password? Click here to reset