Randomized Positional Encodings Boost Length Generalization of Transformers

by   Anian Ruoss, et al.

Transformers have impressive generalization capabilities on tasks with a fixed context length. However, they fail to generalize to sequences of arbitrary length, even for seemingly simple tasks such as duplicating a string. Moreover, simply training on longer sequences is inefficient due to the quadratic computation complexity of the global attention mechanism. In this work, we demonstrate that this failure mode is linked to positional encodings being out-of-distribution for longer sequences (even for relative encodings) and introduce a novel family of positional encodings that can overcome this problem. Concretely, our randomized positional encoding scheme simulates the positions of longer sequences and randomly selects an ordered subset to fit the sequence's length. Our large-scale empirical evaluation of 6000 models across 15 algorithmic reasoning tasks shows that our method allows Transformers to generalize to sequences of unseen length (increasing test accuracy by 12.0 average).


The Impact of Positional Encoding on Length Generalization in Transformers

Length generalization, the ability to generalize from small training con...

Length Generalization in Arithmetic Transformers

We examine how transformers cope with two challenges: learning basic int...

Exploring Length Generalization in Large Language Models

The ability to extrapolate from short problem instances to longer ones i...

Big Bird: Transformers for Longer Sequences

Transformers-based models, such as BERT, have been one of the most succe...

Giraffe: Adventures in Expanding Context Lengths in LLMs

Modern large language models (LLMs) that rely on attention mechanisms ar...

LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models

In recent years, there have been remarkable advancements in the performa...

Monotonic Location Attention for Length Generalization

We explore different ways to utilize position-based cross-attention in s...

Please sign up or login with your details

Forgot password? Click here to reset