SKaMPI-OpenSHMEM: Measuring OpenSHMEM Communication Routines

02/18/2022
by   Camille Coti, et al.
0

Benchmarking is an important challenge in HPC, in particular, to be able to tune the basic blocks of the software environment used by applications. The communication library and distributed run-time environment are among the most critical ones. In particular, many of the routines provided by communication libraries can be adjusted using parameters such as buffer sizes and communication algorithm. As a consequence, being able to measure accurately the time taken by these routines is crucial in order to optimize them and achieve the best performance. For instance, the SKaMPI library was designed to measure the time taken by MPI routines, relying on MPI's two-sided communication model to measure one-sided and two-sided peer-to-peer communication and collective routines. In this paper, we discuss the benchmarking challenges specific to OpenSHMEM's communication model, mainly to avoid inter-call pipelining and overlapping when measuring the time taken by its routines. We extend SKaMPI for OpenSHMEM for this purpose and demonstrate measurement algorithms that address OpenSHMEM's communication model in practice. Scaling experiments are run on the Summit platform to compare different benchmarking approaches on the SKaMPI benchmark operations. These show the advantages of our techniques for more accurate performance characterization.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/13/2019

AITuning: Machine Learning-based Tuning Tool for Run-Time Communication Libraries

In this work, we address the problem of tuning communication libraries b...
research
05/27/2021

Measuring OpenSHMEM Communication Routines with SKaMPI-OpenSHMEM User's manual

This document presents the OpenSHMEM extension for the Special Karlsruhe...
research
10/26/2020

Leveraging MPI RMA to optimise halo-swapping communications in MONC on Cray machines

Remote Memory Access (RMA), also known as single sided communications, p...
research
05/29/2017

Increasing the Efficiency of Sparse Matrix-Matrix Multiplication with a 2.5D Algorithm and One-Sided MPI

Matrix-matrix multiplication is a basic operation in linear algebra and ...
research
10/09/2018

Decoupled Strategy for Imbalanced Workloads in MapReduce Frameworks

In this work, we consider the integration of MPI one-sided communication...
research
03/28/2021

MT-lib: A Topology-aware Message Transfer Library for Graph500 on Supercomputers

We present MT-lib, an efficient message transfer library for messages ga...
research
08/21/2022

IAAT: A Input-Aware Adaptive Tuning framework for Small GEMM

GEMM with the small size of input matrices is becoming widely used in ma...

Please sign up or login with your details

Forgot password? Click here to reset