Performance Optimization and Parallelization of a Parabolic Equation Solver in Computational Ocean Acoustics on Modern Many-core Computer

by   Min Xu, et al.

As one of open-source codes widely used in computational ocean acoustics, FOR3D can provide a very good estimate for underwater acoustic propagation. In this paper, we propose a performance optimization and parallelization to speed up the running of FOR3D. We utilized a variety of methods to enhance the entire performance, such as using a multi-threaded programming model to exploit the potential capability of the many-core node of high-performance computing (HPC) system, tuning compile options, using efficient tuned mathematical library and utilizing vectorization optimization instruction. In addition, we extended the application from single-frequency calculation to multi-frequency calculation successfully by using OpenMP+MPI hybrid programming techniques on the mainstream HPC platform. A detailed performance evaluation was performed and the results showed that the proposed parallelization obtained good accelerated effect of 25.77X when testing a typical three-dimensional medium-sized case on Tianhe-2 supercomputer. It also showed that the tuned parallel version has a weak-scalability. The speed of calculation of underwater sound field can be greatly improved by the strategy mentioned in this paper. The method used in this paper is not only applicable to other similar computing models in computational ocean acoustics but also a guideline of performance enhancement for scientific and engineering application running on modern many-core-computing platform.


Specx: a C++ task-based runtime system for heterogeneous distributed architectures

Parallelization is needed everywhere, from laptops and mobile phones to ...

Evaluation of the Partitioned Global Address Space (PGAS) model for an inviscid Euler solver

In this paper we evaluate the performance of Unified Parallel C (which i...

Optimizing Irregular-Shaped Matrix-Matrix Multiplication on Multi-Core DSPs

General Matrix Multiplication (GEMM) has a wide range of applications in...

Collecting and Presenting Reproducible Intranode Stencil Performance: INSPECT

Stencil algorithms have been receiving considerable interest in HPC rese...

Automated Generation of High-Performance Computational Fluid Dynamics Codes

Domain-Specific Languages (DSLs) improve programmers productivity by dec...

A Hybrid MPI-CUDA Approach for Nonequispaced Discrete Fourier Transformation

Nonequispaced discrete Fourier transformation (NDFT) is widely applied i...

A Chebyshev-Tau spectral method for normal modes of underwater sound propagation with a layered marine environment

The normal mode model is one of the most popular approaches for solving ...

Please sign up or login with your details

Forgot password? Click here to reset