HDOT – an Approach Towards Productive Programming of Hybrid Applications

12/18/2019
by   Jan Ciesko, et al.
0

MPI applications matter. However, with the advent of many-core processors, traditional MPI applications are challenged to achieve satisfactory performance. This is due to the inability of these applications to respond to load imbalances, to reduce serialization imposed by synchronous communication patterns, to overlap communication with computation and finally to deal with increasing memory overheads. The MPI specification provides asynchronous calls to mitigate some of these factors. However, application developers rarely make the effort to apply them efficiently. In this work, we present a methodology to develop hybrid applications called Hierarchical Domain Over-decomposition with Tasking (HDOT), that reduces programming effort by emphasizing the reuse of data partition schemes from process-level and applying them on task-level, allowing a top-down approach to express concurrency and allowing a natural coexistence between MPI and shared-memory programming models. Our integration of MPI and OmpSs-2 shows promising results in terms of programmability and performance measured on a set of applications.

READ FULL TEXT

page 15

page 26

research
06/28/2022

Lessons Learned on MPI+Threads Communication

Hybrid MPI+threads programming is gaining prominence, but, in practice, ...
research
07/14/2020

MPI Collectives for Multi-core Clusters: Optimized Performance of the Hybrid MPI+MPI Parallel Codes

The advent of multi-/many-core processors in clusters advocates hybrid p...
research
02/07/2020

Desynchronization and Wave Pattern Formation in MPI-Parallel and Hybrid Memory-Bound Programs

Analytic, first-principles performance modeling of distributed-memory pa...
research
11/12/2020

Fibers are not (P)Threads: The Case for Loose Coupling of Asynchronous Programming Models and MPI Through Continuations

Asynchronous programming models (APM) are gaining more and more traction...
research
03/22/2019

Hierarchical Dynamic Loop Self-Scheduling on Distributed-Memory Systems Using an MPI+MPI Approach

Computationally-intensive loops are the primary source of parallelism in...
research
01/10/2019

Integrating Blocking and Non-Blocking MPI Primitives with Task-Based Programming Models

In this paper we present the Task-Aware MPI library (TAMPI) that integra...
research
08/01/2018

CRUM: Checkpoint-Restart Support for CUDA's Unified Memory

Unified Virtual Memory (UVM) was recently introduced on recent NVIDIA GP...

Please sign up or login with your details

Forgot password? Click here to reset