Novel Model-based Methods for Performance Optimization of Multithreaded 2D Discrete Fourier Transform on Multicore Processors
In this paper, we use multithreaded fast Fourier transforms provided in three highly optimized packages, FFTW-2.1.5, FFTW-3.3.7, and Intel MKL FFT, to present a novel model-based parallel computing technique as a very effective and portable method for optimization of scientific multithreaded routines for performance, especially in the current multicore era where the processors have abundant number of cores. We propose two optimization methods, PFFT-FPM and PFFT-FPM-PAD, based on this technique. They compute 2D-DFT of a complex signal matrix of size NxN using p abstract processors. Both algorithms take as inputs, discrete 3D functions of performance against problem size of the processors and output the transformed signal matrix. Based on our experiments on a modern Intel Haswell multicore server consisting of 36 physical cores, the average and maximum speedups observed for PFFT-FPM using FFTW-3.3.7 are 1.9x and 6.8x respectively and the average and maximum speedups observed using Intel MKL FFT are 1.3x and 2x respectively. The average and maximum speedups observed for PFFT-FPM-PAD using FFTW-3.3.7 are 2x and 9.4x respectively and the average and maximum speedups observed using Intel MKL FFT are 1.4x and 5.9x respectively.
READ FULL TEXT