LaMemo: Language Modeling with Look-Ahead Memory

04/15/2022
by   Haozhe Ji, et al.
0

Although Transformers with fully connected self-attentions are powerful to model long-term dependencies, they are struggling to scale to long texts with thousands of words in language modeling. One of the solutions is to equip the model with a recurrence memory. However, existing approaches directly reuse hidden states from the previous segment that encodes contexts in a uni-directional way. As a result, this prohibits the memory to dynamically interact with the current context that provides up-to-date information for token prediction. To remedy this issue, we propose Look-Ahead Memory (LaMemo) that enhances the recurrence memory by incrementally attending to the right-side tokens, and interpolating with the old memory states to maintain long-term information in the history. LaMemo embraces bi-directional attention and segment recurrence with an additional computation overhead only linearly proportional to the memory length. Experiments on widely used language modeling benchmarks demonstrate its superiority over the baselines equipped with different types of memory.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/01/2021

∞-former: Infinite Memory Transformer

Transformers struggle when attending to long contexts, since the amount ...
research
06/12/2023

Augmenting Language Models with Long-Term Memory

Existing large language models (LLMs) can only afford fix-sized inputs d...
research
05/13/2021

Not All Memories are Created Equal: Learning to Forget by Expiring

Attention mechanisms have shown promising results in sequence modeling t...
research
02/04/2021

Adaptive Semiparametric Language Models

We present a language model that combines a large parametric neural netw...
research
01/25/2016

Long Short-Term Memory-Networks for Machine Reading

In this paper we address the question of how to render sequence-level ne...
research
10/05/2021

Language Modeling using LMUs: 10x Better Data Efficiency or Improved Scaling Compared to Transformers

Recent studies have demonstrated that the performance of transformers on...

Please sign up or login with your details

Forgot password? Click here to reset