Accelerating the computation of FLAPW methods on heterogeneous architectures

12/19/2017
by   Davor Davidović, et al.
0

Legacy codes in computational science and engineering have been very successful in providing essential functionality to researchers. However, they are not capable of exploiting the massive parallelism provided by emerging heterogeneous architectures. The lack of portable performance and scalability puts them at high risk: either they evolve or they are doomed to disappear. One example of legacy code which would heavily benefit from a modern design is FLEUR, a software for electronic structure calculations. In previous work, the computational bottleneck of FLEUR was partially re-engineered to have a modular design that relies on standard building blocks, namely BLAS and LAPACK. In this paper, we demonstrate how the initial redesign enables the portability to heterogeneous architectures. More specifically, we study different approaches to port the code to architectures consisting of multi-core CPUs equipped with one or more coprocessors such as Nvidia GPUs and Intel Xeon Phis. Our final code attains over 70% of the architectures' peak performance, and outperforms Nvidia's and Intel's libraries. Finally, on JURECA, the supercomputer where FLEUR is often executed, the code takes advantage of the full power of the computing nodes, attaining 5× speedup over the sole use of the CPUs.

READ FULL TEXT
research
10/31/2016

Hybrid CPU-GPU generation of the Hamiltonian and Overlap matrices in FLAPW methods

In this paper we focus on the integration of high-performance numerical ...
research
02/19/2020

Honing and proofing Astrophysical codes on the road to Exascale. Experiences from code modernization on many-core systems

The complexity of modern and upcoming computing architectures poses seve...
research
07/12/2019

Simulating Nonlinear Neutrino Oscillations on Next-Generation Many-Core Architectures

In this work an astrophysical simulation code, XFLAT, is developed to st...
research
09/24/2018

Software for Sparse Tensor Decomposition on Emerging Computing Architectures

In this paper, we develop software for decomposing sparse tensors that i...
research
06/15/2020

Solving the Bethe-Salpeter equation on massively parallel architectures

The last ten years have witnessed fast spreading of massively parallel c...
research
11/19/2022

Assessing Opportunities of SYCL and Intel oneAPI for Biological Sequence Alignment

Background and objectives. The computational biology area is growing up ...
research
09/09/2023

Towards Accelerating High-Order Stencils on Modern GPUs and Emerging Architectures with a Portable Framework

PDE discretization schemes yielding stencil-like computing patterns are ...

Please sign up or login with your details

Forgot password? Click here to reset