Training and Inference on Any-Order Autoregressive Models the Right Way

05/26/2022
by   Andy Shih, et al.
3

Conditional inference on arbitrary subsets of variables is a core problem in probabilistic inference with important applications such as masked language modeling and image inpainting. In recent years, the family of Any-Order Autoregressive Models (AO-ARMs) – which includes popular models such as XLNet – has shown breakthrough performance in arbitrary conditional tasks across a sweeping range of domains. But, in spite of their success, in this paper we identify significant improvements to be made to previous formulations of AO-ARMs. First, we show that AO-ARMs suffer from redundancy in their probabilistic model, i.e., they define the same distribution in multiple different ways. We alleviate this redundancy by training on a smaller set of univariate conditionals that still maintains support for efficient arbitrary conditional inference. Second, we upweight the training loss for univariate conditionals that are evaluated more frequently during inference. Our method leads to improved performance with no compromises on tractability, giving state-of-the-art likelihoods in arbitrary conditional modeling on text (Text8), image (CIFAR10, ImageNet32), and continuous tabular data domains.

READ FULL TEXT

page 15

page 16

research
02/23/2020

Predictive Sampling with Forecasting Autoregressive Models

Autoregressive models (ARMs) currently hold state-of-the-art performance...
research
11/23/2017

An Improved Training Procedure for Neural Autoregressive Data Completion

Neural autoregressive models are explicit density estimators that achiev...
research
03/22/2017

LogitBoost autoregressive networks

Multivariate binary distributions can be decomposed into products of uni...
research
10/05/2020

Inference Strategies for Machine Translation with Conditional Masking

Conditional masked language model (CMLM) training has proven successful ...
research
09/13/2019

Flow Models for Arbitrary Conditional Likelihoods

Understanding the dependencies among features of a dataset is at the cor...
research
10/25/2021

Distributionally Robust Recurrent Decoders with Random Network Distillation

Neural machine learning models can successfully model language that is s...

Please sign up or login with your details

Forgot password? Click here to reset