Automatic Parallelization of Sequential Programs

07/29/2018
by   Peter Kraft, et al.
0

Prior work on Automatically Scalable Computation (ASC) suggests that it is possible to parallelize sequential computation by building a model of whole-program execution, using that model to predict future computations, and then speculatively executing those future computations. Although that prior work demonstrated scaling, it did not demonstrate speedup, because it ran entirely in emulation. We took this as a challenge to construct a hardware prototype that embodies the ideas of ASC, but works on a broader range of programs and runs natively on hardware. The resulting system is similar in spirit to the original work, but differs in practically every respect. We present an implementation of the ASC architecture that runs natively on x86 hardware and achieves near-linear speedup up to 44-cores (the size of our test platform) for several classes of programs, such as computational kernels, map-style programs, and matrix operations. We observe that programs are either completely predictable, achieving near-perfect predictive accuracy, or totally unpredictable, and therefore not amenable to scaling via ASC-like techniques. We also find that in most cases, speedup is limited only by implementation details: the overhead of our dependency tracking infrastructure and the manipulation of large state spaces. We are able to automatically parallelize programs with linked data structures that are not amenable to other forms of automatic parallelization.

READ FULL TEXT
research
08/02/2018

Parallelization of the FFT on SO(3)

In this paper, a work-optimal parallelization of Kostelec and Rockmore's...
research
10/17/2016

OpenMP, OpenMP/MPI, and CUDA/MPI C programs for solving the time-dependent dipolar Gross-Pitaevskii equation

We present new versions of the previously published C and CUDA programs ...
research
06/15/2022

Searching Entangled Program Spaces

Many problem domains, including program synthesis and rewrite-based opti...
research
05/07/2022

Can We Run in Parallel? Automating Loop Parallelization for TornadoVM

With the advent of multi-core systems, GPUs and FPGAs, loop parallelizat...
research
01/03/2019

Efficient Race Detection with Futures

This paper addresses the problem of provably efficient and practically g...
research
10/18/2022

NearPM: A Near-Data Processing System for Storage-Class Applications

Persistent Memory (PM) technologies enable program recovery to a consist...
research
06/19/2019

Reduced I/O Latency with Futures

Task parallelism research has traditionally focused on optimizing comput...

Please sign up or login with your details

Forgot password? Click here to reset