From micro-OPs to abstract resources: constructing a simpler CPU performance model through microbenchmarking

12/21/2020
by   Nicolas Derumigny, et al.
0

In a super-scalar architecture, the scheduler dynamically assigns micro-operations (μops) to execution ports. The port mapping of an architecture describes how an instruction decomposes into μops and lists for each μops the set of ports it can be mapped to. It is used by compilers and performance debugging tools to characterize the performance throughput of a sequence of instructions repeatedly executed as the core component of a loop. This paper introduces a dual equivalent representation: The resource mapping of an architecture is an abstract model where, to be executed, an instruction must use a set of abstract resources, themselves representing combinations of execution ports. For a given architecture, finding a port mapping is an important but difficult problem. Building a resource mapping is a more tractable problem and provides a simpler and equivalent model. This paper describes PALMED, a tool that automatically builds a resource mapping for pipelined, super-scalar, out-of-order CPU architectures. PALMED does not require hardware performance counters, and relies solely on runtime measurements. We evaluate the pertinence of our dual representation for throughput modeling by extracting a representative set of basic-blocks from the compiled binaries of the SPEC CPU 2017 benchmarks <cit.>. We compared the throughput predicted by existing machine models to that produced by , and found comparable accuracy to state-of-the art tools, achieving sub-10 % mean square error rate on this workload on Intel's Skylake microarchitecture.

READ FULL TEXT
research
09/04/2018

Automated Instruction Stream Throughput Prediction for Intel and AMD Microarchitectures

An accurate prediction of scheduling and execution of instruction stream...
research
10/01/2019

Automatic Throughput and Critical Path Analysis of x86 and ARM Assembly Kernels

Useful models of loop kernel runtimes on out-of-order architectures requ...
research
04/21/2020

PMEvo: Portable Inference of Port Mappings for Out-of-Order Processors by Evolutionary Optimization

Achieving peak performance in a computer system requires optimizations i...
research
12/21/2019

Verifying x86 Instruction Implementations

Verification of modern microprocessors is a complex task that requires a...
research
12/21/2018

ChamNet: Towards Efficient Network Design through Platform-Aware Model Adaptation

This paper proposes an efficient neural network (NN) architecture design...
research
10/10/2018

uops.info: Characterizing Latency, Throughput, and Port Usage of Instructions on Intel Microarchitectures

Modern microarchitectures are some of the world's most complex man-made ...
research
11/01/2022

Optimization of Oblivious Decision Tree Ensembles Evaluation for CPU

CatBoost is a popular machine learning library. CatBoost models are base...

Please sign up or login with your details

Forgot password? Click here to reset