The Memory Controller Wall: Benchmarking the Intel FPGA SDK for OpenCL Memory Interface

by   Hamid Reza Zohouri, et al.

Supported by their high power efficiency and recent advancements in High Level Synthesis (HLS), FPGAs are quickly finding their way into HPC and cloud systems. Large amounts of work have been done so far on loop and area optimizations for different applications on FPGAs using HLS. However, a comprehensive analysis of the behavior and efficiency of the memory controller of FPGAs is missing in literature, which becomes even more crucial when the limited memory bandwidth of modern FPGAs compared to their GPU counterparts is taken into account. In this work, we will analyze the memory interface generated by Intel FPGA SDK for OpenCL with different configurations for input/output arrays, vector size, interleaving, kernel programming model, on-chip channels, operating frequency, padding, and multiple types of overlapped blocking. Our results point to multiple shortcomings in the memory controller of Intel FPGAs, especially with respect to memory access alignment, that can hinder the programmer's ability in maximizing memory performance in their design. For some of these cases, we will provide work-arounds to improve memory bandwidth efficiency; however, a general solution will require major changes in the memory controller itself.


page 1

page 3

page 4

page 5

page 6

page 7


When HLS Meets FPGA HBM: Benchmarking and Bandwidth Optimization

With the recent release of High Bandwidth Memory (HBM) based FPGA boards...

CXL Memory as Persistent Memory for Disaggregated HPC: A Practical Approach

In the landscape of High-Performance Computing (HPC), the quest for effi...

Chebyshev Filter Diagonalization on Modern Manycore Processors and GPGPUs

Chebyshev filter diagonalization is well established in quantum chemistr...

Combined Spatial and Temporal Blocking for High-Performance Stencil Computation on FPGAs Using OpenCL

Recent developments in High Level Synthesis tools have attracted softwar...

Programmable FPGA-based Memory Controller

Even with generational improvements in DRAM technology, memory access la...

An Efficient I/O Architecture for RAM-based Content-Addressable Memory on FPGA

Despite the impressive search rate of one key per clock cycle, the updat...

Accelerating Markov Random Field Inference with Uncertainty Quantification

Statistical machine learning has widespread application in various domai...

Please sign up or login with your details

Forgot password? Click here to reset