Combinatorial Register Allocation and Instruction Scheduling

This paper introduces a combinatorial optimization approach to register allocation and instruction scheduling, two central compiler problems. Combinatorial optimization has the potential to solve these problems optimally and to exploit processor-specific features readily. Our approach is the first to leverage this potential in practice: it captures the complete set of program transformations used in state-of-the-art compilers, scales to medium-sized functions of up to 1000 instructions, and generates executable code. This level of practicality is reached by using constraint programming, a particularly suitable combinatorial optimization technique. Unison, the implementation of our approach, is open source, used in industry, and integrated with the LLVM toolchain. An extensive evaluation of estimated speed, code size, and scalability confirms that Unison generates better code than LLVM while scaling to medium-sized functions. The evaluation uses systematically selected benchmarks from MediaBench and SPEC CPU2006 and different processor architectures (Hexagon, ARM, MIPS). Mean estimated speedup ranges from 1 code size reduction ranges from 0.8 Executing the generated code on Hexagon confirms that the estimated speedup indeed results in actual speedup. Given a fixed time limit, Unison solves optimally functions of up to 647 instructions, delivers improved solutions for functions of up to 874 instructions, and achieves more than 85 potential speed for 90 The results in this paper show that our combinatorial approach can be used in practice to trade compilation time for code quality beyond the usual compiler optimization levels, fully exploit processor-specific features, and identify improvement opportunities in existing heuristic algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/21/2018

Ithemal: Accurate, Portable and Fast Basic Block Throughput Estimation using Deep Neural Networks

Statically estimating the number of processor clock cycles it takes to e...
research
01/16/2019

Realize special instructions on clustering VLIW DSP: multiplication-accumulation instruction

BWDSP is a 32bit static scalar digital signal processor with VLIW and SI...
research
02/07/2019

Revec: Program Rejuvenation through Revectorization

Modern microprocessors are equipped with Single Instruction Multiple Dat...
research
07/23/2021

Mitigating Power Attacks through Fine-Grained Instruction Reordering

Side-channel attacks are a security exploit that take advantage of infor...
research
06/27/2021

A Case Study of LLVM-Based Analysis for Optimizing SIMD Code Generation

This paper presents a methodology for using LLVM-based tools to tune the...
research
11/11/2020

Efficient global register allocation

In a compiler, an essential component is the register allocator. Two mai...
research
06/03/2019

A scheme for dynamically integrating C library functions into a λProlog implementation

The Teyjus system realizes the higher-order logic programming languageλP...

Please sign up or login with your details

Forgot password? Click here to reset