DeFINE: DEep Factorized INput Word Embeddings for Neural Sequence Modeling

11/27/2019
by   Sachin Mehta, et al.
0

For sequence models with large word-level vocabularies, a majority of network parameters lie in the input and output layers. In this work, we describe a new method, DeFINE, for learning deep word-level representations efficiently. Our architecture uses a hierarchical structure with novel skip-connections which allows for the use of low dimensional input and output layers, reducing total parameters and training time while delivering similar or better performance versus existing methods. DeFINE can be incorporated easily in new or existing sequence models. Compared to state-of-the-art methods including adaptive input representations, this technique results in a 6 WikiText-103, DeFINE reduces the total parameters of Transformer-XL by half with minimal impact on performance. On the Penn Treebank, DeFINE improves AWD-LSTM by 4 points with a 17 performance to state-of-the-art methods with fewer parameters. For machine translation, DeFINE improves the efficiency of the Transformer model by about 1.4 times while delivering similar performance.

READ FULL TEXT

page 16

page 19

research
08/31/2018

Beyond Weight Tying: Learning Joint Input-Output Embeddings for Neural Machine Translation

Tying the weights of the target word embeddings with the target word cla...
research
08/03/2020

DeLighT: Very Deep and Light-weight Transformer

We introduce a very deep and light-weight transformer, DeLighT, that del...
research
09/28/2018

Adaptive Input Representations for Neural Language Modeling

We introduce adaptive input representations for neural language modeling...
research
10/22/2019

Depth-Adaptive Transformer

State of the art sequence-to-sequence models perform a fixed number of c...
research
08/27/2019

Multiresolution Transformer Networks: Recurrence is Not Essential for Modeling Hierarchical Structure

The architecture of Transformer is based entirely on self-attention, and...
research
10/20/2016

Neural Machine Translation with Characters and Hierarchical Encoding

Most existing Neural Machine Translation models use groups of characters...
research
08/10/2023

PlankAssembly: Robust 3D Reconstruction from Three Orthographic Views with Learnt Shape Programs

In this paper, we develop a new method to automatically convert 2D line ...

Please sign up or login with your details

Forgot password? Click here to reset