Efficient Realization of Givens Rotation through Algorithm-Architecture Co-design for Acceleration of QR Factorization

03/14/2018
by   Farhad Merchant, et al.
0

We present efficient realization of Generalized Givens Rotation (GGR) based QR factorization that achieves 3-100x better performance in terms of Gflops/watt over state-of-the-art realizations on multicore, and General Purpose Graphics Processing Units (GPGPUs). GGR is an improvement over classical Givens Rotation (GR) operation that can annihilate multiple elements of rows and columns of an input matrix simultaneously. GGR takes 33 multiplications compared to GR. For custom implementation of GGR, we identify macro operations in GGR and realize them on a Reconfigurable Data-path (RDP) tightly coupled to pipeline of a Processing Element (PE). In PE, GGR attains speed-up of 1.1x over Modified Householder Transform (MHT) presented in the literature. For parallel realization of GGR, we use REDEFINE, a scalable massively parallel Coarse-grained Reconfigurable Architecture, and show that the speed-up attained is commensurate with the hardware resources in REDEFINE. GGR also outperforms General Matrix Multiplication (gemm) by 10 Gflops/watt which is counter-intuitive.

READ FULL TEXT

page 1

page 4

page 5

page 7

page 9

page 11

page 13

research
12/14/2016

Efficient Realization of Householder Transform through Algorithm-Architecture Co-design for Acceleration of QR Factorization

We present efficient realization of Householder Transform (HT) based QR ...
research
02/10/2018

Achieving Efficient Realization of Kalman Filter on CGRA through Algorithm-Architecture Co-design

In this paper, we present efficient realization of Kalman Filter (KF) th...
research
10/20/2016

Accelerating BLAS on Custom Architecture through Algorithm-Architecture Co-design

Basic Linear Algebra Subprograms (BLAS) play key role in high performanc...
research
07/06/2017

Pipelined Parallel FFT Architecture

In this paper, an optimized efficient VLSI architecture of a pipeline Fa...
research
07/02/2020

GSoFa: Scalable Sparse LU Symbolic Factorization on GPUs

Decomposing a matrix A into a lower matrix L and an upper matrix U, whic...
research
11/06/2020

Mapping Stencils on Coarse-grained Reconfigurable Spatial Architecture

Stencils represent a class of computational patterns where an output gri...
research
08/06/2020

Design of Reconfigurable Multi-Operand Adder for Massively Parallel Processing

The paper presents a systematic study and implementation of a reconfigur...

Please sign up or login with your details

Forgot password? Click here to reset