Experience with PCIe streaming on FPGA for high throughput ML inferencing

10/22/2021
by   Piyush Manavar, et al.
0

Achieving maximum possible rate of inferencing with minimum hardware resources plays a major role in reducing enterprise operational costs. In this paper we explore use of PCIe streaming on FPGA based platforms to achieve high throughput. PCIe streaming is a unique capability available on FPGA that eliminates the need for memory copy overheads. We have presented our results for inferences on a gradient boosted trees model, for online retail recommendations. We compare the results achieved with the popular library implementations on GPU and the CPU platforms and observe that the PCIe streaming enabled FPGA implementation achieves the best overall measured performance. We also measure power consumption across all platforms and find that the PCIe streaming on FPGA platform achieves the 25x and 12x better energy efficiency than an implementation on CPU and GPU platforms, respectively. We discuss the conditions that need to be met, in order to achieve this kind of acceleration on the FPGA. Further, we analyze the run time statistics on GPU and FPGA and identify opportunities to enhance performance on both the platforms.

READ FULL TEXT

page 2

page 4

research
12/22/2021

HP-GNN: Generating High Throughput GNN Training Implementation on CPU-FPGA Heterogeneous Platform

Graph Neural Networks (GNNs) have shown great success in many applicatio...
research
08/26/2019

AccD: A Compiler-based Framework for Accelerating Distance-related Algorithms on CPU-FPGA Platforms

As a promising solution to boost the performance of distance-related alg...
research
01/11/2022

High Throughput Multidimensional Tridiagonal Systems Solvers on FPGAs

We present a design space exploration for synthesizing optimized, high-t...
research
10/21/2022

Improving Energy Efficiency of Permissioned Blockchains Using FPGAs

Permissioned blockchains like Hyperledger Fabric have become quite popul...
research
05/26/2018

Time-Shared Execution of Realtime Streaming Pipelines by Dynamic Partial Reconfiguration

This paper presents an FPGA runtime framework that demonstrates the feas...
research
03/23/2023

Computing and Compressing Electron Repulsion Integrals on FPGAs

The computation of electron repulsion integrals (ERIs) over Gaussian-typ...
research
05/22/2018

CascadeCNN: Pushing the performance limits of quantisation

This work presents CascadeCNN, an automated toolflow that pushes the qua...

Please sign up or login with your details

Forgot password? Click here to reset