Approximation theory of transformer networks for sequence modeling

05/29/2023
by   Haotian Jiang, et al.
0

The transformer is a widely applied architecture in sequence modeling applications, but the theoretical understanding of its working principles is limited. In this work, we investigate the ability of transformers to approximate sequential relationships. We first prove a universal approximation theorem for the transformer hypothesis space. From its derivation, we identify a novel notion of regularity under which we can prove an explicit approximation rate estimate. This estimate reveals key structural properties of the transformer and suggests the types of sequence relationships that the transformer is adapted to approximating. In particular, it allows us to concretely discuss the structural bias between the transformer and classical sequence modeling methods, such as recurrent neural networks. Our findings are supported by numerical experiments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/30/2023

Inverse Approximation Theory for Nonlinear Recurrent Neural Networks

We prove an inverse approximation theorem for the approximation of nonli...
research
05/29/2023

Forward and Inverse Approximation Theory for Linear Temporal Convolutional Networks

We present a theoretical analysis of the approximation properties of con...
research
08/11/2022

Structural Biases for Improving Transformers on Translation into Morphologically Rich Languages

Machine translation has seen rapid progress with the advent of Transform...
research
07/05/2023

Sumformer: Universal Approximation for Efficient Transformers

Natural language processing (NLP) made an impressive jump with the intro...
research
06/18/2019

Scheduled Sampling for Transformers

Scheduled sampling is a technique for avoiding one of the known problems...
research
02/14/2023

Energy Transformer

Transformers have become the de facto models of choice in machine learni...
research
05/26/2022

Your Transformer May Not be as Powerful as You Expect

Relative Positional Encoding (RPE), which encodes the relative distance ...

Please sign up or login with your details

Forgot password? Click here to reset