Ithemal: Accurate, Portable and Fast Basic Block Throughput Estimation using Deep Neural Networks

by   Charith Mendis, et al.

Statically estimating the number of processor clock cycles it takes to execute a basic block of assembly instructions in steady state (throughput) is important for compiler backend optimizations such as register allocation, instruction selection and instruction scheduling. This is complicated specially in modern x86-64 Complex Instruction Set Computer (CISC) machines with sophisticated processor microarchitectures. Traditionally, compiler writers invest time experimenting and referring to processor manuals to analytically model modern processors with incomplete specifications. This is tedious, error prone and should be done for each processor generation. We present Ithemal, the first automatically learnt estimator to statically predict throughput of a set of basic block instructions using machine learning. Ithemal uses a novel Directed Acyclic Graph-Recurrent Neural Network (DAG-RNN) based data-driven approach for throughput estimation. We show that Ithemal is accurate than state-of-the-art hand written tools used in compiler backends and static machine code analyzers. In particular, our model has a worst case average error of 10.53 of 19.57 machine code analyzer when compared on three different microarchitectures, while predicting throughput values at a faster rate than aforementioned tools. We also show that Ithemal is portable, learning throughput estimation for Intel Nehalem, Haswell and Skylake microarchitectures without requiring changes to its structure.


page 1

page 2

page 3

page 4


GRANITE: A Graph Neural Network Model for Basic Block Throughput Estimation

Analytical hardware performance models yield swift estimation of desired...

Combinatorial Register Allocation and Instruction Scheduling

This paper introduces a combinatorial optimization approach to register ...

Improved Basic Block Reordering

Basic block reordering is an important step for profile-guided binary op...

A Case Study of LLVM-Based Analysis for Optimizing SIMD Code Generation

This paper presents a methodology for using LLVM-based tools to tune the...

Puppeteer: A Random Forest-based Manager for Hardware Prefetchers across the Memory Hierarchy

Over the years, processor throughput has steadily increased. However, th...

XDA: Accurate, Robust Disassembly with Transfer Learning

Accurate and robust disassembly of stripped binaries is challenging. The...

CG-OoO: Energy-Efficient Coarse-Grain Out-of-Order Execution

We introduce the Coarse-Grain Out-of-Order (CG- OoO) general purpose pro...

Please sign up or login with your details

Forgot password? Click here to reset