Coneheads: Hierarchy Aware Attention

06/01/2023
by   Albert Tseng, et al.
0

Attention networks such as transformers have achieved state-of-the-art performance in many domains. These networks rely heavily on the dot product attention operator, which computes the similarity between two points by taking their inner product. However, the inner product does not explicitly model the complex structural properties of real world datasets, such as hierarchies between data points. To remedy this, we introduce cone attention, a drop-in replacement for dot product attention based on hyperbolic entailment cones. Cone attention associates two points by the depth of their lowest common ancestor in a hierarchy defined by hyperbolic cones, which intuitively measures the divergence of two points and gives a hierarchy aware similarity score. We test cone attention on a wide variety of models and tasks and show that it improves task-level performance over dot product attention and other baselines, and is able to match dot-product attention with significantly fewer parameters. Our results suggest that cone attention is an effective way to capture hierarchical relationships when calculating attention.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/21/2023

Hyena Hierarchy: Towards Larger Convolutional Language Models

Recent advances in deep learning have relied heavily on the use of large...
research
02/07/2023

FFHR: Fully and Flexible Hyperbolic Representation for Knowledge Graph Completion

Learning hyperbolic embeddings for knowledge graph (KG) has gained incre...
research
08/28/2019

Attention-based Fusion for Outfit Recommendation

This paper describes an attention-based fusion method for outfit recomme...
research
05/24/2018

Hyperbolic Attention Networks

We introduce hyperbolic attention networks to endow neural networks with...
research
04/18/2023

Hyperbolic Image-Text Representations

Visual and linguistic concepts naturally organize themselves in a hierar...
research
10/04/2018

Graph Embedding with Shifted Inner Product Similarity and Its Improved Approximation Capability

We propose shifted inner-product similarity (SIPS), which is a novel yet...
research
02/27/2019

Representation Learning with Weighted Inner Product for Universal Approximation of General Similarities

We propose weighted inner product similarity (WIPS) for neural-network b...

Please sign up or login with your details

Forgot password? Click here to reset