Variational Attention for Sequence-to-Sequence Models
The variational encoder-decoder (VED) encodes source information as a set of random variables using a neural network, which in turn is decoded into target data using another neural network. In natural language processing, sequence-to-sequence (Seq2Seq) models typically serve as encoder-decoder networks. When combined with a traditional (deterministic) attention mechanism, the variational latent space may be bypassed by the attention model, making the generated sentences less diversified. In our paper, we propose a variational attention mechanism for VED, where the attention vector is modeled as normally distributed random variables. Experiments show that variational attention increases diversity while retaining high quality. We also show that the model is not sensitive to hyperparameters.
READ FULL TEXT