tcFFT: Accelerating Half-Precision FFT through Tensor Cores

04/23/2021
by   Binrui Li, et al.
0

Fast Fourier Transform (FFT) is an essential tool in scientific and engineering computation. The increasing demand for mixed-precision FFT has made it possible to utilize half-precision floating-point (FP16) arithmetic for faster speed and energy saving. Specializing in lower precision, NVIDIA Tensor Cores can deliver extremely high computation performance. However, the fixed computation pattern makes it hard to utilize the computing power of Tensor Cores in FFT. Therefore, we developed tcFFT to accelerate FFT with Tensor Cores. Our tcFFT supports batched 1D and 2D FFT of various sizes and it exploits a set of optimizations to achieve high performance: 1) single-element manipulation on Tensor Core fragments to support special operations needed by FFT; 2) fine-grained data arrangement design to coordinate with the GPU memory access pattern. We evaluated our tcFFT and the NVIDIA cuFFT in various sizes and dimensions on NVIDIA V100 and A100 GPUs. The results show that our tcFFT can outperform cuFFT 1.29x-3.24x and 1.10x-3.03x on the two GPUs, respectively. Our tcFFT has a great potential for mixed-precision scientific applications.

READ FULL TEXT

page 5

page 8

research
03/11/2018

NVIDIA Tensor Core Programmability, Performance & Precision

The NVIDIA Volta GPU microarchitecture introduces a specialized unit, ca...
research
07/13/2020

A Survey of Numerical Methods Utilizing Mixed Precision Arithmetic

Within the past years, hardware vendors have started designing low preci...
research
01/15/2020

GPU Tensor Cores for fast Arithmetic Reductions

This work proposes a GPU tensor core approach that encodes the arithmeti...
research
10/25/2021

Mixed precision in Graphics Processing Unit

Modern graphics computing units (GPUs) are designed and optimized to per...
research
08/01/2023

Boosting the Performance of Object Tracking with a Half-Precision Particle Filter on GPU

High-performance GPU-accelerated particle filter methods are critical fo...
research
06/23/2021

APNN-TC: Accelerating Arbitrary Precision Neural Networks on Ampere GPU Tensor Cores

Over the years, accelerating neural networks with quantization has been ...
research
07/15/2020

Accelerating Geometric Multigrid Preconditioning with Half-Precision Arithmetic on GPUs

With the hardware support for half-precision arithmetic on NVIDIA V100 G...

Please sign up or login with your details

Forgot password? Click here to reset