Orthogonal layers of parallelism in large-scale eigenvalue computations

09/05/2022
by   Andreas Alvermann, et al.
0

We address the communication overhead of distributed sparse matrix-(multiple)-vector multiplication in the context of large-scale eigensolvers, using filter diagonalization as an example. The basis of our study is a performance model which includes a communication metric that is computed directly from the matrix sparsity pattern without running any code. The performance model quantifies to which extent scalability and parallel efficiency are lost due to communication overhead. To restore scalability, we identify two orthogonal layers of parallelism in the filter diagonalization technique. In the horizontal layer the rows of the sparse matrix are distributed across individual processes. In the vertical layer bundles of multiple vectors are distributed across separate process groups. An analysis in terms of the communication metric predicts that scalability can be restored if, and only if, one implements the two orthogonal layers of parallelism via different distributed vector layouts. Our theoretical analysis is corroborated by benchmarks for application matrices from quantum and solid state physics. We finally demonstrate the benefits of using orthogonal layers of parallelism with two exemplary application cases – an exciton and a strongly correlated electron system – which incur either small or large communication overhead.

READ FULL TEXT
research
12/23/2011

Sparse matrix-vector multiplication on GPGPU clusters: A new storage format and a scalable implementation

Sparse matrix-vector multiplication (spMVM) is the dominant operation in...
research
05/30/2021

2.5-dimensional distributed model training

Data parallelism does a good job in speeding up the training. However, w...
research
08/19/2023

GNNPipe: Accelerating Distributed Full-Graph GNN Training with Pipelined Model Parallelism

Current distributed full-graph GNN training methods adopt a variant of d...
research
10/30/2017

A Massively Parallel Algorithm for the Approximate Calculation of Inverse p-th Roots of Large Sparse Matrices

We present the submatrix method, a highly parallelizable method for the ...
research
12/12/2017

Integrated Model and Data Parallelism in Training Neural Networks

We propose a new integrated method of exploiting both model and data par...
research
10/31/2017

A multi-layer network based on Sparse Ternary Codes for universal vector compression

We present the multi-layer extension of the Sparse Ternary Codes (STC) f...
research
05/30/2016

ParMAC: distributed optimisation of nested functions, with application to learning binary autoencoders

Many powerful machine learning models are based on the composition of mu...

Please sign up or login with your details

Forgot password? Click here to reset