Memformer: The Memory-Augmented Transformer

10/14/2020
by   Qingyang Wu, et al.
1

Transformer models have obtained remarkable accomplishments in various NLP tasks. However, these models have efficiency issues on long sequences, as the complexity of their self-attention module scales quadratically with the sequence length. To remedy the limitation, we present Memformer, a novel language model that utilizes a single unified memory to encode and retrieve past information. It includes a new optimization scheme, Memory Replay Back-Propagation, which promotes long-range back-propagation through time with a significantly reduced memory requirement. Memformer achieves 𝒪(n) time complexity and 𝒪(1) space complexity in processing long sequences, meaning that the model can handle an infinite length sequence during inference. Our model is also compatible with other self-supervised tasks to further improve the performance on language modeling. Experimental results show that Memformer outperforms the previous long-range sequence models on WikiText-103, including Transformer-XL and compressive Transformer.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/13/2019

Compressive Transformers for Long-Range Sequence Modelling

We present the Compressive Transformer, an attentive sequence model whic...
research
10/14/2022

CAB: Comprehensive Attention Benchmarking on Long Sequence Modeling

Transformer has achieved remarkable success in language, image, and spee...
research
10/10/2021

DCT: Dynamic Compressive Transformer for Modeling Unbounded Sequence

In this paper, we propose Dynamic Compressive Transformer (DCT), a trans...
research
09/13/2020

Cluster-Former: Clustering-based Sparse Transformer for Long-Range Dependency Encoding

Transformer has become ubiquitous in the deep learning field. One of the...
research
04/12/2021

Updater-Extractor Architecture for Inductive World State Representations

Developing NLP models traditionally involves two stages - training and a...
research
03/23/2022

Linearizing Transformer with Key-Value Memory Bank

Transformer has brought great success to a wide range of natural languag...
research
01/29/2023

Exploring Attention Map Reuse for Efficient Transformer Neural Networks

Transformer-based deep neural networks have achieved great success in va...

Please sign up or login with your details

Forgot password? Click here to reset