OneStopTuner: An End to End Architecture for JVM Tuning of Spark Applications

09/07/2020
by   Venktesh V, et al.
0

Java is the backbone of widely used big data frameworks, such as Apache Spark, due to its productivity, portability from JVM-based execution, and support for a rich set of libraries. However, the performance of these applications can widely vary depending on the runtime flags chosen out of all existing JVM flags. Manually tuning these flags is both cumbersome and error-prone. Automated tuning approaches can ease the task, but current solutions either require considerable processing time or target a subset of flags to avoid time and space requirements. In this paper, we present OneStopTuner, a Machine Learning based novel framework for autotuning JVM flags. OneStopTuner controls the amount of data generation by leveraging batch mode active learning to characterize the user application. Based on the user-selected optimization metric, OneStopTuner then discards the irrelevant JVM flags by applying feature selection algorithms on the generated data. Finally, it employs sample efficient methods such as Bayesian optimization and regression guided Bayesian optimization on the shortlisted JVM flags to find the optimal values for the chosen set of flags. We evaluated OneStopTuner on widely used Spark benchmarks and compare its performance with the traditional simulated annealing based autotuning approach. We demonstrate that for optimizing execution time, the flags chosen by OneStopTuner provides a speedup of up to 1.35x over default Spark execution, as compared to 1.15x speedup by using the flag configurations proposed by simulated annealing. OneStopTuner was able to reduce the number of executions for data-generation by 70 to suggest the optimal flag configuration 2.4x faster than the standard simulated annealing based approach, excluding the time for data-generation.

READ FULL TEXT
research
08/17/2018

Learning-based Automatic Parameter Tuning for Big Data Analytics Frameworks

Big data analytics frameworks (BDAFs) have been widely used for data pro...
research
10/15/2020

Autotuning PolyBench Benchmarks with LLVM Clang/Polly Loop Optimization Pragmas Using Bayesian Optimization

An autotuning is an approach that explores a search space of possible im...
research
04/21/2023

Self-Correcting Bayesian Optimization through Bayesian Active Learning

Gaussian processes are cemented as the model of choice in Bayesian optim...
research
04/27/2021

Autotuning PolyBench Benchmarks with LLVM Clang/Polly Loop Optimization Pragmas Using Bayesian Optimization (extended version)

In this paper, we develop a ytopt autotuning framework that leverages Ba...
research
03/17/2022

Non-Elitist Selection among Survivor Configurations can Improve the Performance of Irace

Modern optimization strategies such as evolutionary algorithms, ant colo...
research
04/27/2023

Compiler Auto-tuning through Multiple Phase Learning

Widely used compilers like GCC and LLVM usually have hundreds of optimiz...
research
06/22/2020

Adaptive Parameter Tuning for Reachability Analysis of Linear Systems

Despite the possibility to quickly compute reachable sets of large-scale...

Please sign up or login with your details

Forgot password? Click here to reset