Multi-branch Attentive Transformer

06/18/2020
by   Yang Fan, et al.
0

While the multi-branch architecture is one of the key ingredients to the success of computer vision tasks, it has not been well investigated in natural language processing, especially sequence learning tasks. In this work, we propose a simple yet effective variant of Transformer called multi-branch attentive Transformer (briefly, MAT), where the attention layer is the average of multiple branches and each branch is an independent multi-head attention layer. We leverage two training techniques to regularize the training: drop-branch, which randomly drops individual branches during training, and proximal initialization, which uses a pre-trained Transformer model to initialize multiple branches. Experiments on machine translation, code generation and natural language understanding demonstrate that such a simple variant of Transformer brings significant improvements. Our code is available at <https://github.com/HA-Transformer>.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/17/2021

Dual-view Molecule Pre-training

Inspired by its success in natural language processing and computer visi...
research
06/29/2020

Multi-Head Attention: Collaborate Instead of Concatenate

Attention layers are widely used in natural language processing (NLP) an...
research
05/30/2022

Illumination Adaptive Transformer

Challenging illumination conditions (low light, underexposure and overex...
research
06/01/2021

Exploring Dynamic Selection of Branch Expansion Orders for Code Generation

Due to the great potential in facilitating software development, code ge...
research
01/20/2020

Multi-level Head-wise Match and Aggregation in Transformer for Textual Sequence Matching

Transformer has been successfully applied to many natural language proce...
research
05/31/2021

Cascaded Head-colliding Attention

Transformers have advanced the field of natural language processing (NLP...
research
09/19/2023

MAGIC-TBR: Multiview Attention Fusion for Transformer-based Bodily Behavior Recognition in Group Settings

Bodily behavioral language is an important social cue, and its automated...

Please sign up or login with your details

Forgot password? Click here to reset