Property-based testing for Spark Streaming
Stream processing has reached the mainstream in the last years, as a new generation of open source distributed stream processing systems, designed for scaling horizontally on commodity hardware, has brought the capability for processing high volume and high velocity data streams to companies of all sizes. In this work we propose a combination of temporal logic and property-based testing (PBT) for dealing with the challenges of testing programs that employ this programming model. We formalize our approach in a discrete time temporal logic for finite words, with some additions to improve the expressiveness of properties, which includes timeouts for temporal operators and a binding operator for letters. In particular we focus on testing Spark Streaming programs written with the Spark API for the functional language Scala, using the PBT library ScalaCheck. For that we add temporal logic operators to a set of new ScalaCheck generators and properties, as part of our testing library sscheck. Under consideration in Theory and Practice of Logic Programming (TPLP).
READ FULL TEXT