Linear pretraining in recurrent mixture density networks

02/27/2023
by   Hubert Normandin-Taillon, et al.
0

We present a method for pretraining a recurrent mixture density network (RMDN). We also propose a slight modification to the architecture of the RMDN-GARCH proposed by Nikolaev et al. [2012]. The pretraining method helps the RMDN avoid bad local minima during training and improves its robustness to the persistent NaN problem, as defined by Guillaumes [2017], which is often encountered with mixture density networks. Such problem consists in frequently obtaining "Not a number" (NaN) values during training. The pretraining method proposed resolves these issues by training the linear nodes in the hidden layer of the RMDN before starting including non-linear node updates. Such an approach improves the performance of the RMDN and ensures it surpasses that of the GARCH model, which is the RMDN's linear counterpart.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/20/2013

Unsupervised Pretraining Encourages Moderate-Sparseness

It is well known that direct training of deep neural networks will gener...
research
11/02/2018

Sentence Encoders on STILTs: Supplementary Training on Intermediate Labeled-data Tasks

Pretraining with language modeling and related unsupervised tasks has re...
research
06/02/2016

Multi-pretrained Deep Neural Network

Pretraining is widely used in deep neutral network and one of the most f...
research
03/24/2021

Mixture Density Network Estimation of Continuous Variable Maximum Likelihood Using Discrete Training Samples

Mixture Density Networks (MDNs) can be used to generate probability dens...
research
04/15/2021

Pseudo Zero Pronoun Resolution Improves Zero Anaphora Resolution

The use of pretrained masked language models (MLMs) has drastically impr...
research
12/02/2022

ColD Fusion: Collaborative Descent for Distributed Multitask Finetuning

Pretraining has been shown to scale well with compute, data size and dat...
research
10/05/2020

PMI-Masking: Principled masking of correlated spans

Masking tokens uniformly at random constitutes a common flaw in the pret...

Please sign up or login with your details

Forgot password? Click here to reset