Closing the Performance Gap with Modern C++

by   Thomas Heller, et al.

On the way to Exascale, programmers face the increasing challenge of having to support multiple hardware architectures from the same code base. At the same time, portability of code and performance are increasingly difficult to achieve as hardware architectures are becoming more and more diverse. Today's heterogeneous systems often include two or more completely distinct and incompatible hardware execution models, such as GPGPU's, SIMD vector units, and general purpose cores which conventionally have to be programmed using separate tool chains representing non-overlapping programming models. The recent revival of interest in the industry and the wider community for the C++ language has spurred a remarkable amount of standardization proposals and technical specifications in the arena of concurrency and parallelism. This recently includes an increasing amount of discussion around the need for a uniform, higher-level abstraction and programming model for parallelism in the C++ standard targeting heterogeneous and distributed computing. Such an abstraction should perfectly blend with existing, already standardized language and library features, but should also be generic enough to support future hardware developments. In this paper, we present the results from developing such a higher-level programming abstraction for parallelism in C++ which aims at enabling code and performance portability over a wide range of architectures and for various types of parallelism. We present and compare performance data obtained from running the well-known STREAM benchmark ported to our higher level C++ abstraction with the corresponding results from running it natively. We show that our abstractions enable performance at least as good as the comparable base-line benchmarks while providing a uniform programming API on all compared target architectures.


HSTREAM: A directive-based language extension for heterogeneous stream computing

Big data streaming applications require utilization of heterogeneous par...

Enabling Multi-threading in Heterogeneous Quantum-Classical Programming Models

In this paper, we address some of the key limitations to realizing a gen...

A high-level characterisation and generalisation of communication-avoiding programming techniques

Today's hardware's explosion of concurrency plus the explosion of data w...

Extended abstract: Type oriented programming for task based parallelism

Writing parallel codes is difficult and exhibits a fundamental trade-off...

Chiplets and the Codelet Model

Recently, hardware technology has rapidly evolved pertaining to domain-s...

Rapid Exploration of Optimization Strategies on Advanced Architectures using TestSNAP and LAMMPS

The exascale race is at an end with the announcement of the Aurora and F...

Honing and proofing Astrophysical codes on the road to Exascale. Experiences from code modernization on many-core systems

The complexity of modern and upcoming computing architectures poses seve...

Please sign up or login with your details

Forgot password? Click here to reset