On the Parameterization and Initialization of Diagonal State Space Models

06/23/2022
by   Albert Gu, et al.
4

State space models (SSM) have recently been shown to be very effective as a deep learning layer as a promising alternative to sequence models such as RNNs, CNNs, or Transformers. The first version to show this potential was the S4 model, which is particularly effective on tasks involving long-range dependencies by using a prescribed state matrix called the HiPPO matrix. While this has an interpretable mathematical mechanism for modeling long dependencies, it introduces a custom representation and algorithm that can be difficult to implement. On the other hand, a recent variant of S4 called DSS showed that restricting the state matrix to be fully diagonal can still preserve the performance of the original model when using a specific initialization based on approximating S4's matrix. This work seeks to systematically understand how to parameterize and initialize such diagonal state space models. While it follows from classical results that almost all SSMs have an equivalent diagonal form, we show that the initialization is critical for performance. We explain why DSS works mathematically, by showing that the diagonal restriction of S4's matrix surprisingly recovers the same kernel in the limit of infinite state dimension. We also systematically describe various design choices in parameterizing and computing diagonal SSMs, and perform a controlled empirical study ablating the effects of these choices. Our final model S4D is a simple diagonal version of S4 whose kernel computation requires just 2 lines of code and performs comparably to S4 in almost all settings, with state-of-the-art results for image, audio, and medical time-series domains, and averaging 85% on the Long Range Arena benchmark.

READ FULL TEXT
research
03/27/2022

Diagonal State Spaces are as Effective as Structured State Spaces

Modeling long range dependencies in sequential data is a fundamental ste...
research
06/24/2022

How to Train Your HiPPO: State Space Models with Generalized Orthogonal Basis Projections

Linear time-invariant state space models (SSM) are a classical model fro...
research
12/01/2022

Simplifying and Understanding State Space Models with Diagonal Linear RNNs

Sequence models based on linear state spaces (SSMs) have recently emerge...
research
02/27/2023

Diagonal State Space Augmented Transformers for Speech Recognition

We improve on the popular conformer architecture by replacing the depthw...
research
10/17/2022

What Makes Convolutional Models Great on Long Sequence Modeling?

Convolutional models have been widely used in multiple domains. However,...
research
05/16/2023

Counterfactual Outcome Prediction using Structured State Space Model

Counterfactual outcome prediction in longitudinal data has recently gain...
research
06/27/2022

Long Range Language Modeling via Gated State Spaces

State space models have shown to be effective at modeling long range dep...

Please sign up or login with your details

Forgot password? Click here to reset