Parallel QR Factorization of Block Low-Rank Matrices

08/12/2022
βˆ™
by   M. Ridwan Apriansyah, et al.
βˆ™
0
βˆ™

We present two new algorithms for Householder QR factorization of Block Low-Rank (BLR) matrices: one that performs block-column-wise QR, and another that is based on tiled QR. We show how the block-column-wise algorithm exploits BLR structure to achieve arithmetic complexity of π’ͺ(mn), while the tiled BLR-QR exhibits π’ͺ(mn^1.5) complexity. However, the tiled BLR-QR has finer task granularity that allows parallel task-based execution on shared memory systems. We compare the block-column-wise BLR-QR using fork-join parallelism with tiled BLR-QR using task-based parallelism. We also compare these two implementations of Householder BLR-QR with a block-column-wise Modified Gram-Schmidt (MGS) BLR-QR using fork-join parallelism, and a state-of-the-art vendor-optimized dense Householder QR in Intel MKL. For a matrix of size 131k Γ— 65k, all BLR methods are more than an order of magnitude faster than the dense QR in MKL. Our methods are also robust to ill-conditioning and produce better orthogonal factors than the existing MGS-based method. On a CPU with 64 cores, our parallel tiled Householder and block-column-wise Householder algorithms show a speedup of 50 and 37 times, respectively.

READ FULL TEXT

page 10

page 12

page 18

research
βˆ™ 07/03/2023

Butterfly factorization by algorithmic identification of rank-one blocks

Many matrices associated with fast transforms posess a certain low-rank ...
research
βˆ™ 08/23/2022

Scalable Linear Time Dense Direct Solver for 3-D Problems Without Trailing Sub-Matrix Dependencies

Factorization of large dense matrices are ubiquitous in engineering and ...
research
βˆ™ 05/21/2021

BELT: Block-wise Missing Embedding Learning Transformer

Matrix completion has attracted attention in many fields, including stat...
research
βˆ™ 03/03/2017

Decoupled Block-Wise ILU(k) Preconditioner on GPU

This research investigates the implementation mechanism of block-wise IL...
research
βˆ™ 08/03/2022

Block Discrete Empirical Interpolation Methods

We present two block variants of the discrete empirical interpolation me...
research
βˆ™ 06/30/2023

A Simple Proof for Efficient Federated Low Rank Matrix Recovery from column-wise Linear Projections

This note provides a significantly simpler and shorter proof of our samp...
research
βˆ™ 01/26/2023

Low-Rank Winograd Transformation for 3D Convolutional Neural Networks

This paper focuses on Winograd transformation in 3D convolutional neural...

Please sign up or login with your details

Forgot password? Click here to reset