Adaptive Input Representations for Neural Language Modeling

09/28/2018
by   Alexei Baevski, et al.
0

We introduce adaptive input representations for neural language modeling which extend the adaptive softmax of Grave et al. (2017) to input representations of variable capacity. There are several choices on how to factorize the input and output layers, and whether to model words, characters or sub-word units. We perform a systematic comparison of popular choices for a self-attentional architecture. Our experiments show that models equipped with adaptive embeddings are more than twice as fast to train than the popular character input CNN while having a lower number of parameters. We achieve a new state of the art on the benchmark of 20.51 perplexity, improving the next best known result by 8.7 perplexity. On the Billion word benchmark, we achieve a state of the art of 24.14 perplexity.

READ FULL TEXT
research
11/04/2016

Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling

Recurrent neural networks have been very successful at predicting sequen...
research
10/13/2016

Compressing Neural Language Models by Sparse Word Representations

Neural networks are among the state-of-the-art techniques for language m...
research
03/22/2018

An Analysis of Neural Language Modeling at Multiple Scales

Many of the leading approaches in language modeling introduce novel, com...
research
11/27/2019

DeFINE: DEep Factorized INput Word Embeddings for Neural Sequence Modeling

For sequence models with large word-level vocabularies, a majority of ne...
research
09/06/2017

A Neural Language Model for Dynamically Representing the Meanings of Unknown Words and Entities in a Discourse

This study addresses the problem of identifying the meaning of unknown w...
research
08/14/2018

Improved Language Modeling by Decoding the Past

Highly regularized LSTMs that model the auto-regressive conditional fact...
research
12/22/2021

The Importance of the Current Input in Sequence Modeling

The last advances in sequence modeling are mainly based on deep learning...

Please sign up or login with your details

Forgot password? Click here to reset