AIR – A Light-Weight Yet High-Performance Dataflow Engine based on Asynchronous Iterative Routing

01/01/2020
by   Vinu E. Venugopal, et al.
0

Distributed Stream Processing Systems (DSPSs) are among the currently most emerging topics in data management, with applications ranging from real-time event monitoring to processing complex dataflow programs and big data analytics. The major market players in this domain are clearly represented by Apache Spark and Flink, which provide a variety of frontend APIs for SQL, statistical inference, machine learning, stream processing, and many others. Yet rather few details are reported on the integration of these engines into the underlying High-Performance Computing (HPC) infrastructure and the communication protocols they use. Spark and Flink, for example, are implemented in Java and still rely on a dedicated master node for managing their control flow among the worker nodes in a compute cluster. In this paper, we describe the architecture of our AIR engine, which is designed from scratch in C++ using the Message Passing Interface (MPI), pthreads for multithreading, and is directly deployed on top of a common HPC workload manager such as SLURM. AIR implements a light-weight, dynamic sharding protocol (referred to as "Asynchronous Iterative Routing"), which facilitates a direct and asynchronous communication among all client nodes and thereby completely avoids the overhead induced by the control flow with a master node that may otherwise form a performance bottleneck. Our experiments over a variety of benchmark settings confirm that AIR outperforms Spark and Flink in terms of latency and throughput by a factor of up to 15; moreover, we demonstrate that AIR scales out much better than existing DSPSs to clusters consisting of up to 8 nodes and 224 cores.

READ FULL TEXT
research
04/26/2019

A Benchmarking Study to Evaluate Apache Spark on Large-Scale Supercomputers

As dataset sizes increase, data analysis tasks in high performance compu...
research
11/03/2022

MPI-based Evaluation of Coordinator Election Algorithms

In this paper, we detail how two types of distributed coordinator electi...
research
12/06/2022

DisTRaC: Accelerating High Performance Compute Processing for Temporary Data Storage

High Performance Compute (HPC) clusters often produce intermediate files...
research
11/18/2022

TensAIR: Online Learning from Data Streams via Asynchronous Iterative Routing

Online learning (OL) from data streams is an emerging area of research t...
research
11/21/2022

Node-Type-Based Load-Balancing Routing for Parallel Generalized Fat-Trees

High-Performance Computing (HPC) clusters are made up of a variety of no...
research
05/19/2022

Cloudprofiler: TSC-based inter-node profiling and high-throughput data ingestion for cloud streaming workloads

To conduct real-time analytics computations, big data stream processing ...
research
04/19/2023

GeoGauss: Strongly Consistent and Light-Coordinated OLTP for Geo-Replicated SQL Database

Multinational enterprises conduct global business that has a demand for ...

Please sign up or login with your details

Forgot password? Click here to reset