Time-aware Large Kernel Convolutions

02/08/2020
by   Vasileios Lioutas, et al.
1

To date, most state-of-the-art sequence modelling architectures use attention to build generative models for language based tasks. Some of these models use all the available sequence tokens to generate an attention distribution which results in time complexity of O(n^2). Alternatively, they utilize depthwise convolutions with softmax normalized kernels of size k acting as a limited-window self-attention, resulting in time complexity of O(k·n). In this paper, we introduce Time-aware Large Kernel (TaLK) Convolutions, a novel adaptive convolution operation that learns to predict the size of a summation kernel instead of using the fixed-sized kernel matrix. This method yields a time complexity of O(n), effectively making the sequence encoding process linear to the number of tokens. We evaluate the proposed method on large-scale standard machine translation and language modelling datasets and show that TaLK Convolutions constitute an efficient improvement over other attention/convolution based approaches.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/29/2019

Pay Less Attention with Lightweight and Dynamic Convolutions

Self-attention is a useful mechanism to build generative models for lang...
research
03/09/2021

Beyond Nyströmformer – Approximation of self-attention by Spectral Shifting

Transformer is a powerful tool for many natural language tasks which is ...
research
05/24/2019

Generative Flow via Invertible nxn Convolution

Flow-based generative models have recently become one of the most effici...
research
09/30/2020

Rethinking Attention with Performers

We introduce Performers, Transformer architectures which can estimate re...
research
12/03/2019

Multiscale Self Attentive Convolutions for Vision and Language Modeling

Self attention mechanisms have become a key building block in many state...
research
02/13/2023

Simple Hardware-Efficient Long Convolutions for Sequence Modeling

State space models (SSMs) have high performance on long sequence modelin...
research
01/25/2018

Generative Adversarial Networks using Adaptive Convolution

Most existing GANs architectures that generate images use transposed con...

Please sign up or login with your details

Forgot password? Click here to reset