Pyramidal Recurrent Unit for Language Modeling

08/27/2018
by   Sachin Mehta, et al.
0

LSTMs are powerful tools for modeling contextual information, as evidenced by their success at the task of language modeling. However, modeling contexts in very high dimensional space can lead to poor generalizability. We introduce the Pyramidal Recurrent Unit (PRU), which enables learning representations in high dimensional space with more generalization power and fewer parameters. PRUs replace the linear transformation in LSTMs with more sophisticated interactions including pyramidal and grouped linear transformations. This architecture gives strong results on word-level language modeling while reducing the number of parameters significantly. In particular, PRU improves the perplexity of a recent state-of-the-art language model Merity et al. (2018) by up to 1.3 points while learning 15-20 PRU outperforms all previous RNN models that exploit different gating mechanisms and transformations. We provide a detailed examination of the PRU and its behavior on the language modeling tasks. Our code is open-source and available at https://sacmehta.github.io/PRU/

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/25/2017

Dual Rectified Linear Units (DReLUs): A Replacement for Tanh Activation Functions in Quasi-Recurrent Neural Networks

In this paper, we introduce a novel type of Rectified Linear Unit (ReLU)...
research
04/19/2021

When FastText Pays Attention: Efficient Estimation of Word Representations using Constrained Positional Weighting

Since the seminal work of Mikolov et al. (2013a) and Bojanowski et al. (...
research
05/29/2021

Predictive Representation Learning for Language Modeling

To effectively perform the task of next-word prediction, long short-term...
research
08/30/2018

Direct Output Connection for a High-Rank Language Model

This paper proposes a state-of-the-art recurrent neural network (RNN) la...
research
09/07/2019

LAMAL: LAnguage Modeling Is All You Need for Lifelong Language Learning

Most research on lifelong learning (LLL) applies to images or games, but...
research
07/10/2018

Revisiting the Hierarchical Multiscale LSTM

Hierarchical Multiscale LSTM (Chung et al., 2016a) is a state-of-the-art...
research
11/30/2019

Modeling German Verb Argument Structures: LSTMs vs. Humans

LSTMs have proven very successful at language modeling. However, it rema...

Please sign up or login with your details

Forgot password? Click here to reset