Minimizing communication in the multidimensional FFT

03/22/2022
by   Thomas Koopman, et al.
0

We present a parallel algorithm for the fast Fourier transform (FFT) in higher dimensions. This algorithm generalizes the cyclic-to-cyclic one-dimensional parallel algorithm to a cyclic-to-cyclic multidimensional parallel algorithm while retaining the property of needing only a single all-to-all communication step. This is under the constraint that we use at most √(N) processors for an FFT on an array with a total of N elements, irrespective of the dimension d or shape of the array. The only assumption we make is that N is sufficiently composite. Our algorithm starts and ends in the same distribution. We present our multidimensional implementation FFTU which utilizes the sequential FFTW program for its local FFTs, and which can handle any dimension d. We obtain experimental results for d≤ 5 using MPI on up to 4096 cores of the supercomputer Snellius, comparing FFTU with the parallel FFTW program and with PFFT. These results show that FFTU is competitive with the state-of-the-art and that it allows to use a larger number of processors, while keeping communication limited to a single all-to-all operation. For arrays of size 1024^3 and 64^5, FFTU achieves a speedup of a factor 149 and 176, respectively, on 4096 processors.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset