Rethinking Positional Encoding in Language Pre-training

06/28/2020
by   Guolin Ke, et al.
0

How to explicitly encode positional information into neural networks is an important problem in natural language processing. In the Transformer model, the positional information is simply encoded as embedding vectors, which are used in the input layer, or encoded as a bias term in the self-attention module. In this work, we investigate the problems in the previous formulations and propose a new positional encoding method for BERT called Transformer with Untied Positional Encoding (TUPE). Different from all other works, TUPE only uses the word embedding as input. In the self-attention module, the word correlation and positional correlation are computed separately with different parameterizations and then added together. This design removes the noisy word-position correlation and gives more expressiveness to characterize the relationship between words/positions by using different projection matrices. Furthermore, TUPE unties the [CLS] symbol from other positions to provide it with a more specific role to capture the global representation of the sentence. Extensive experiments and ablation studies on GLUE benchmark demonstrate the effectiveness and efficiency of the proposed method: TUPE outperforms several baselines on almost all tasks by a large margin. In particular, it can achieve a higher score than baselines while only using 30 costs. We release our code at https://github.com/guolinke/TUPE.

READ FULL TEXT
research
06/28/2020

Rethinking the Positional Encoding in Language Pre-training

How to explicitly encode positional information into neural networks is ...
research
02/16/2021

Revisiting Language Encoding in Learning Multilingual Representations

Transformer has demonstrated its great power to learn contextual word re...
research
07/29/2021

Rethinking and Improving Relative Position Encoding for Vision Transformer

Relative position encoding (RPE) is important for transformer to capture...
research
08/14/2021

PTT: Point-Track-Transformer Module for 3D Single Object Tracking in Point Clouds

3D single object tracking is a key issue for robotics. In this paper, we...
research
12/10/2022

Position Embedding Needs an Independent Layer Normalization

The Position Embedding (PE) is critical for Vision Transformers (VTs) du...
research
03/13/2020

Learning to Encode Position for Transformer with Continuous Dynamical Model

We introduce a new way of learning to encode position information for no...
research
09/15/2020

Cascaded Semantic and Positional Self-Attention Network for Document Classification

Transformers have shown great success in learning representations for la...

Please sign up or login with your details

Forgot password? Click here to reset