An ECM-based energy-efficiency optimization approach for bandwidth-limited streaming kernels on recent Intel Xeon processors

09/12/2016
by   Johannes Hofmann, et al.
0

We investigate an approach that uses low-level analysis and the execution-cache-memory (ECM) performance model in combination with tuning of hardware parameters to lower energy requirements of memory-bound applications. The ECM model is extended appropriately to deal with software optimizations such as non-temporal stores. Using incremental steps and the ECM model, we analytically quantify the impact of various single-core optimizations and pinpoint microarchitectural improvements that are relevant to energy consumption. Using a 2D Jacobi solver as example that can serve as a blueprint for other memory-bound applications, we evaluate our approach on the four most recent Intel Xeon E5 processors (Sandy Bridge-EP, Ivy Bridge-EP, Haswell-EP, and Broadwell-EP). We find that chip energy consumption can be reduced in the range of 2.0-2.4× on the examined processors.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/05/2018

On the accuracy and usefulness of analytic energy models for contemporary multicore processors

This paper presents refinements to the execution-cache-memory performanc...
research
10/31/2020

An analytic performance model for overlapping execution of memory-bound loop kernels on multicore CPUs

Complex applications running on multicore processors show a rich perform...
research
02/24/2017

An analysis of core- and chip-level architectural features in four generations of Intel server processors

This paper presents a survey of architectural features among four genera...
research
09/27/2017

Energy efficiency of finite difference algorithms on multicore CPUs, GPUs, and Intel Xeon Phi processors

In addition to hardware wall-time restrictions commonly seen in high-per...
research
05/15/2021

Comparison of HPC Architectures for Computing All-Pairs Shortest Paths. Intel Xeon Phi KNL vs NVIDIA Pascal

Today, one of the main challenges for high-performance computing systems...
research
12/25/2022

CINM (Cinnamon): A Compilation Infrastructure for Heterogeneous Compute In-Memory and Compute Near-Memory Paradigms

The rise of data-intensive applications exposed the limitations of conve...
research
10/09/2016

Doing Moore with Less -- Leapfrogging Moore's Law with Inexactness for Supercomputing

Energy and power consumption are major limitations to continued scaling ...

Please sign up or login with your details

Forgot password? Click here to reset