Area Attention

by   Yang Li, et al.

Existing attention mechanisms, are mostly item-based in that a model is designed to attend to a single item in a collection of items (the memory). Intuitively, an area in the memory that may contain multiple items can be worth attending to as a whole. We propose area attention: a way to attend to an area of the memory, where each area contains a group of items that are either spatially adjacent when the memory has a 2-dimensional structure, such as images, or temporally adjacent for 1-dimensional memory, such as natural language sentences. Importantly, the size of an area, i.e., the number of items in an area, can vary depending on the learned coherence of the adjacent items. By giving the model the option to attend to an area of items, instead of only a single item, we hope attention mechanisms can better capture the nature of the task. Area attention can work along multi-head attention for attending to multiple areas in the memory. We evaluate area attention on two tasks: neural machine translation and image captioning, and improve upon strong (state-of-the-art) baselines in both cases. These improvements are obtainable with a basic form of area attention that is parameter free. In addition to proposing the novel concept of area attention, we contribute an efficient way for computing it by leveraging the technique of summed area tables.


page 1

page 2

page 3

page 4


Can Active Memory Replace Attention?

Several mechanisms to focus attention of a neural network on selected pa...

Neural Attention for Image Captioning: Review of Outstanding Methods

Image captioning is the task of automatically generating sentences that ...

Computing Simple Mechanisms: Lift-and-Round over Marginal Reduced Forms

We study revenue maximization in multi-item multi-bidder auctions under ...

Learning Slab Classes to Alleviate Memory Holes in Memcached

We consider the problem of memory holes in slab allocators, where an ite...

Can Neural Image Captioning be Controlled via Forced Attention?

Learned dynamic weighting of the conditioning signal (attention) has bee...

A novel HD Computing Algebra: Non-associative superposition of states creating sparse bundles representing order information

Information inflow into a computational system is by a sequence of infor...

Constant Memory Attention Block

Modern foundation model architectures rely on attention mechanisms to ef...

Please sign up or login with your details

Forgot password? Click here to reset