Simulating Stellar Merger using HPX/Kokkos on A64FX on Supercomputer Fugaku

by   Patrick Diehl, et al.

The increasing availability of machines relying on non-GPU architectures, such as ARM A64FX in high-performance computing, provides a set of interesting challenges to application developers. In addition to requiring code portability across different parallelization schemes, programs targeting these architectures have to be highly adaptable in terms of compute kernel sizes to accommodate different execution characteristics for various heterogeneous workloads. In this paper, we demonstrate an approach to code and performance portability that is based entirely on established standards in the industry. In addition to applying Kokkos as an abstraction over the execution of compute kernels on different heterogeneous execution environments, we show that the use of standard C++ constructs as exposed by the HPX runtime system enables superb portability in terms of code and performance based on the real-world Octo-Tiger astrophysics application. We report our experience with porting Octo-Tiger to the ARM A64FX architecture provided by Stony Brook's Ookami and Riken's Supercomputer Fugaku and compare the resulting performance with that achieved on well established GPU-oriented HPC machines such as ORNL's Summit, NERSC's Perlmutter and CSCS's Piz Daint systems. Octo-Tiger scaled well on Supercomputer Fugaku without any major code changes due to the abstraction levels provided by HPX and Kokkos. Adding vectorization support for ARM's SVE to Octo-Tiger was trivial thanks to using standard C++


First Experiences in Performance Benchmarking with the New SPEChpc 2021 Suites

Modern HPC systems are built with innovative system architectures and no...

OpenCL Performance Prediction using Architecture-Independent Features

OpenCL is an attractive model for heterogeneous high-performance computi...

Machine Learning-Driven Adaptive OpenMP For Portable Performance on Heterogeneous Systems

Heterogeneity has become a mainstream architecture design choice for bui...

Application Experiences on a GPU-Accelerated Arm-based HPC Testbed

This paper assesses and reports the experience of ten teams working to p...

Heterogeneous CPU/GPU co-execution of CFD simulations on the POWER9 architecture: Application to airplane aerodynamics

High fidelity Computational Fluid Dynamics simulations are generally ass...

Challenges Porting a C++ Template-Metaprogramming Abstraction Layer to Directive-based Offloading

HPC systems employ a growing variety of compute accelerators with differ...

Simplifying heterogeneous migration between x86 and ARM machines

Heterogeneous computing is the strategy of deploying multiple types of p...

Please sign up or login with your details

Forgot password? Click here to reset