Tuneful: An Online Significance-Aware Configuration Tuner for Big Data Analytics

01/22/2020
by   Ayat Fekry, et al.
0

Distributed analytics engines such as Spark are a common choice for processing extremely large datasets. However, finding good configurations for these systems remains challenging, with each workload potentially requiring a different setup to run optimally. Using suboptimal configurations incurs significant extra runtime costs. gaining traction within data-scientists communities where awareness of such issues is relatively low. We propose Tuneful, an approach that efficiently tunes the configuration of in-memory cluster computing systems. Tuneful combines incremental Sensitivity Analysis and Bayesian optimization to identify near-optimal configurations from a high-dimensional search space, using a small number of executions. This setup allows the tuning to be done online, without any previous training. Our experimental results show that Tuneful reduces the search time for finding close-to-optimal configurations by 62% (at the median) when compared to existing state-of-the-art techniques. This means that the amortization of the tuning cost happens significantly faster, enabling practical tuning for new classes of workloads.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset