Combined Spatial and Temporal Blocking for High-Performance Stencil Computation on FPGAs Using OpenCL

by   Hamid Reza Zohouri, et al.

Recent developments in High Level Synthesis tools have attracted software programmers to accelerate their high-performance computing applications on FPGAs. Even though it has been shown that FPGAs can compete with GPUs in terms of performance for stencil computation, most previous work achieve this by avoiding spatial blocking and restricting input dimensions relative to FPGA on-chip memory. In this work we create a stencil accelerator using Intel FPGA SDK for OpenCL that achieves high performance without having such restrictions. We combine spatial and temporal blocking to avoid input size restrictions, and employ multiple FPGA-specific optimizations to tackle issues arisen from the added design complexity. Accelerator parameter tuning is guided by our performance model, which we also use to project performance for the upcoming Intel Stratix 10 devices. On an Arria 10 GX 1150 device, our accelerator can reach up to 760 and 375 GFLOP/s of compute performance, for 2D and 3D stencils, respectively, which rivals the performance of a highly-optimized GPU implementation. Furthermore, we estimate that the upcoming Stratix 10 devices can achieve a performance of up to 3.5 TFLOP/s and 1.6 TFLOP/s for 2D and 3D stencil computation, respectively.


High Performance Computing with FPGAs and OpenCL

In this work we evaluate the potential of FPGAs for accelerating HPC wor...

High-Performance High-Order Stencil Computation on FPGAs Using OpenCL

In this paper we evaluate the performance of FPGAs for high-order stenci...

Studying the Potential of Automatic Optimizations in the Intel FPGA SDK for OpenCL

High Level Synthesis (HLS) tools, like the Intel FPGA SDK for OpenCL, im...

AN5D: Automated Stencil Framework for High-Degree Temporal Blocking on GPUs

Stencil computation is one of the most widely-used compute patterns in h...

High-Level FPGA Accelerator Design for Structured-Mesh-Based Explicit Numerical Solvers

This paper presents a workflow for synthesizing near-optimal FPGA implem...

The Memory Controller Wall: Benchmarking the Intel FPGA SDK for OpenCL Memory Interface

Supported by their high power efficiency and recent advancements in High...

SASA: A Scalable and Automatic Stencil Acceleration Framework for Optimized Hybrid Spatial and Temporal Parallelism on HBM-based FPGAs

Stencil computation is one of the fundamental computing patterns in many...

Please sign up or login with your details

Forgot password? Click here to reset