Improved Variational Neural Machine Translation by Promoting Mutual Information

09/19/2019
by   Arya D. McCarthy, et al.
0

Posterior collapse plagues VAEs for text, especially for conditional text generation with strong autoregressive decoders. In this work, we address this problem in variational neural machine translation by explicitly promoting mutual information between the latent variables and the data. Our model extends the conditional variational autoencoder (CVAE) with two new ingredients: first, we propose a modified evidence lower bound (ELBO) objective which explicitly promotes mutual information; second, we regularize the probabilities of the decoder by mixing an auxiliary factorized distribution which is directly predicted by the latent variables. We present empirical results on the Transformer architecture and show the proposed model effectively addressed posterior collapse: latent variables are no longer ignored in the presence of powerful decoder. As a result, the proposed model yields improved translation quality while demonstrating superior performance in terms of data efficiency and robustness.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/16/2018

Variational Recurrent Neural Machine Translation

Partially inspired by successful applications of variational recurrent n...
research
12/01/2020

Mutual Information Constraints for Monte-Carlo Objectives

A common failure mode of density models trained as variational autoencod...
research
05/28/2020

Variational Neural Machine Translation with Normalizing Flows

Variational Neural Machine Translation (VNMT) is an attractive framework...
research
02/14/2018

Isolating Sources of Disentanglement in Variational Autoencoders

We decompose the evidence lower bound to show the existence of a term me...
research
06/08/2023

Posterior Collapse in Linear Conditional and Hierarchical Variational Autoencoders

The posterior collapse phenomenon in variational autoencoders (VAEs), wh...
research
10/28/2022

Digital twins of physical printing-imaging channel

In this paper, we address the problem of modeling a printing-imaging cha...

Please sign up or login with your details

Forgot password? Click here to reset