AIM: Accelerating Arbitrary-precision Integer Multiplication on Heterogeneous Reconfigurable Computing Platform Versal ACAP

by   Zhuoping Yang, et al.

Arbitrary-precision integer multiplication is the core kernel of many applications in simulation, cryptography, etc. Existing acceleration of arbitrary-precision integer multiplication includes CPUs, GPUs, FPGAs, and ASICs. Among these accelerators, FPGAs are promised to provide both good energy efficiency and flexibility. Surprisingly, in our implementations, FPGA has the lowest energy efficiency, i.e., 0.29x of the CPU and 0.17x of the GPU with the same generation fabrication. Therefore, key questions arise: Where do the energy efficiency gains of CPUs and GPUs come from? Can reconfigurable computing do better? If can, how to achieve that? We identify that the biggest energy efficiency gains of the CPUs and GPUs come from the dedicated vector units. FPGA uses DSPs and lookup tables to compose the needed computation, which incurs overhead when compared to using vector units directly. New reconfigurable computing, e.g., 'FPGA+vector units' is a novel and feasible solution to improve energy efficiency. In this paper, we propose to map arbitrary-precision integer multiplication onto such a heterogeneous platform, i.e., AMD/Xilinx Versal ACAP architecture. Designing on Versal ACAP incurs several challenges and we propose AIM: Arbitrary-precision Integer Multiplication on Versal ACAP to automate and optimize the design. AIM framework includes design space exploration and AIM automatic code generation to facilitate the system design and verification. We deploy the AIM framework on three different applications, including large integer multiplication (LIM), RSA, and Mandelbrot, on the AMD/Xilinx Versal ACAP VCK190 evaluation board. Our experimental results show that AIM achieves up to 12.6x, and 2.1x energy efficiency gains over the Intel Xeon Ice Lake 6346 CPU, and NVidia A5000 GPU respectively, which brings reconfigurable computing the most energy-efficient platform among CPUs and GPUs.


page 1

page 3

page 5

page 7

page 8


Why is FPGA-GPU Heterogeneity the Best Option for Embedded Deep Neural Networks?

Graphics Processing Units (GPUs) are currently the dominating programmab...

Optimizing Bit-Serial Matrix Multiplication for Reconfigurable Computing

Matrix-matrix multiplication is a key computational kernel for numerous ...

BISMO: A Scalable Bit-Serial Matrix Multiplication Overlay for Reconfigurable Computing

Matrix-matrix multiplication is a key computational kernel for numerous ...

Virtual Screening on FPGA: Performance and Energy versus Effort

With their widespread availability, FPGA-based accelerators cards have b...

Heterogeneous FPGA+GPU Embedded Systems: Challenges and Opportunities

The edge computing paradigm has emerged to handle cloud computing issues...

AutoMM: Energy-Efficient Multi-Data-Type Matrix Multiply Design on Heterogeneous Programmable System-on-Chip

As the increasing complexity of Neural Network(NN) models leads to high ...

Towards Automatic Transformation of Legacy Scientific Code into OpenCL for Optimal Performance on FPGAs

There is a large body of legacy scientific code written in languages lik...

Please sign up or login with your details

Forgot password? Click here to reset