AutoDSE: Enabling Software Programmers Design Efficient FPGA Accelerators

by   Atefeh Sohrabizadeh, et al.

Adopting FPGA as an accelerator in datacenters is becoming mainstream for customized computing, but the fact that FPGAs are hard to program creates a steep learning curve for software programmers. Even with the help of high-level synthesis (HLS), accelerator designers still have to manually perform code reconstruction and cumbersome parameter tuning to achieve the optimal performance. While many learning models have been leveraged by existing work to automate the design of efficient accelerators, the unpredictability of modern HLS tools becomes a major obstacle for them to maintain high accuracy. In this paper, we address this problem by incorporating an automated DSE framework-AutoDSE- that leverages bottleneck-guided gradient optimizer to systematically find abetter design point. AutoDSE finds the bottleneck of the design in each step and focuses on high-impact parameters to overcome that, which is similar to the approach an expert would take. The experimental results show that AutoDSE is able to find the design point that achieves, on the geometric mean, 19.9x speedup over one CPU core for Machsuite and Rodinia benchmarks and 1.04x over the manually designed HLS accelerated vision kernels in Xilinx Vitis libraries yet with 26x reduction of their optimization pragmas. With less than one optimization pragma per design on average, we are making progress towards democratizing customizable computing by enabling software programmers to design efficient FPGA accelerators.


page 1

page 2

page 3

page 4


Shire: Making FPGA-accelerated Middlebox Development More Pleasant

We introduce an approach to designing FPGA-accelerated middleboxes that ...

Synetgy: Algorithm-hardware Co-design for ConvNet Accelerators on Embedded FPGAs

Using FPGAs to accelerate ConvNets has attracted significant attention i...

AutoAccel: Automated Accelerator Generation and Optimization with Composable, Parallel and Pipeline Architecture

CPU-FPGA heterogeneous architectures are attracting ever-increasing atte...

Extending High-Level Synthesis for Task-Parallel Programs

C/C++/OpenCL-based high-level synthesis (HLS) becomes more and more popu...

ARAPrototyper: Enabling Rapid Prototyping and Evaluation for Accelerator-Rich Architectures

Compared to conventional general-purpose processors, accelerator-rich ar...

High-Performance Simultaneous Multiprocessing for Heterogeneous System-on-Chip

This paper presents a methodology for simultaneous heterogeneous computi...

Computing and Compressing Electron Repulsion Integrals on FPGAs

The computation of electron repulsion integrals (ERIs) over Gaussian-typ...

Please sign up or login with your details

Forgot password? Click here to reset