Profiling and Optimizing Java Streams

02/20/2023
by   Eduardo Rosales, et al.
0

The Stream API was added in Java 8 to allow the declarative expression of data-processing logic, typically map-reduce-like data transformations on collections and datasets. The Stream API introduces two key abstractions. The stream, which is a sequence of elements available in a data source, and the stream pipeline, which contains operations (e.g., map, filter, reduce) that are applied to the elements in the stream upon execution. Streams are getting popular among Java developers as they leverage the conciseness of functional programming and ease the parallelization of data processing. Despite the benefits of streams, in comparison to data processing relying on imperative code, streams can introduce significant overheads which are mainly caused by extra object allocations and reclamations, and the use of virtual method calls. As a result, developers need means to study the runtime behavior of streams in the goal of both mitigating such abstraction overheads and optimizing stream processing. Unfortunately, there is a lack of dedicated tools able to dynamically analyze streams to help developers specifically locate issues degrading application performance. In this paper, we address the profiling and optimization of streams. We present a novel profiling technique for measuring the computations performed by a stream in terms of elapsed reference cycles, which we use to locate problematic streams with a major impact on application performance. While accuracy is crucial to this end, the inserted instrumentation code causes the execution of extra cycles, which are partially included in the profiles. To mitigate this issue, we estimate and compensate for the extra cycles caused by the inserted instrumentation code. We implement our approach in StreamProf that, to the best of our knowledge, is the first dedicated stream profiler for the Java Virtual Machine (JVM). With StreamProf, we find that cycle profiling is effective to detect problematic streams whose optimization can enable significant performance gains. We also find that the accurate profiling of tasks supporting parallel stream processing allows the diagnosis of load imbalance according to the distribution of stream-related cycles at a thread level. We conduct an evaluation on sequential and parallel stream-based workloads that are publicly available in three different sources. The evaluation shows that our profiling technique is efficient and yields accurate profiles. Moreover, we show the actionability of our profiles by guiding stream-related optimizations on two workloads from Renaissance. Our optimizations require the modification of only a few lines of code while achieving speedups up to a factor of 5x. Java streams have been extensively studied by recent work, focusing on both how developers are using streams and how to optimize them. Current approaches in the optimization of streams mainly rely on static analysis techniques that overlook runtime information, suffer from important limitations to detect all streams executed by a Java application, or are not suitable for the analysis of parallel streams. Understanding the dynamic behavior of both sequential and parallel stream processing and its impact on application performance is crucial to help users make better decisions while using streams.

READ FULL TEXT

page 23

page 24

page 25

page 26

page 27

page 29

research
07/06/2020

Multi-tenant Pub/Sub Processing for Real-time Data Streams

Devices and sensors generate streams of data across a diversity of locat...
research
03/25/2021

Understanding the Challenges and Assisting Developers with Developing Spark Applications

To process data more efficiently, big data frameworks provide data abstr...
research
07/18/2023

Stream Types

We propose a rich foundational theory of typed data streams and stream t...
research
11/24/2022

Highest-performance Stream Processing

We present the stream processing library that achieves the highest perfo...
research
06/20/2022

Phoebe: QoS-Aware Distributed Stream Processing through Anticipating Dynamic Workloads

Distributed Stream Processing systems have become an essential part of b...
research
09/30/2022

Towards effective assessment of steady state performance in Java software: Are we there yet?

Microbenchmarking is a widely used form of performance testing in Java s...
research
02/14/2022

Enhancing expressivity of checked corecursive streams (extended version)

We propose a novel approach to stream definition and manipulation. Our s...

Please sign up or login with your details

Forgot password? Click here to reset