Orthogonal Gated Recurrent Unit with Neumann-Cayley Transformation

08/12/2022
by   Edison Mucllari, et al.
0

In recent years, using orthogonal matrices has been shown to be a promising approach in improving Recurrent Neural Networks (RNNs) with training, stability, and convergence, particularly, to control gradients. While Gated Recurrent Unit (GRU) and Long Short Term Memory (LSTM) architectures address the vanishing gradient problem by using a variety of gates and memory cells, they are still prone to the exploding gradient problem. In this work, we analyze the gradients in GRU and propose the usage of orthogonal matrices to prevent exploding gradient problems and enhance long-term memory. We study where to use orthogonal matrices and we propose a Neumann series-based Scaled Cayley transformation for training orthogonal matrices in GRU, which we call Neumann-Cayley Orthogonal GRU, or simply NC-GRU. We present detailed experiments of our model on several synthetic and real-world tasks, which show that NC-GRU significantly outperforms GRU as well as several other RNNs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/29/2017

Orthogonal Recurrent Neural Networks with Scaled Cayley Transform

Recurrent Neural Networks (RNNs) are designed to handle sequential data ...
research
01/18/2018

Overcoming the vanishing gradient problem in plain recurrent networks

Plain recurrent networks greatly suffer from the vanishing gradient prob...
research
03/30/2020

SiTGRU: Single-Tunnelled Gated Recurrent Unit for Abnormality Detection

Abnormality detection is a challenging task due to the dependence on a s...
research
05/28/2019

Non-normal Recurrent Neural Network (nnRNN): learning long time dependencies while improving expressivity with transient dynamics

A recent strategy to circumvent the exploding and vanishing gradient pro...
research
11/25/2019

Gating Revisited: Deep Multi-layer RNNs That Can Be Trained

We propose a new stackable recurrent cell (STAR) for recurrent neural ne...
research
05/09/2018

Long Short-Term Memory as a Dynamically Computed Element-wise Weighted Sum

LSTMs were introduced to combat vanishing gradients in simple RNNs by au...
research
12/15/2016

Tunable Efficient Unitary Neural Networks (EUNN) and their application to RNNs

Using unitary (instead of general) matrices in artificial neural network...

Please sign up or login with your details

Forgot password? Click here to reset