Non-asymptotic model selection in block-diagonal mixture of polynomial experts models

by   TrungTin Nguyen, et al.

Model selection, via penalized likelihood type criteria, is a standard task in many statistical inference and machine learning problems. Progress has led to deriving criteria with asymptotic consistency results and an increasing emphasis on introducing non-asymptotic criteria. We focus on the problem of modeling non-linear relationships in regression data with potential hidden graph-structured interactions between the high-dimensional predictors, within the mixture of experts modeling framework. In order to deal with such a complex situation, we investigate a block-diagonal localized mixture of polynomial experts (BLoMPE) regression model, which is constructed upon an inverse regression and block-diagonal structures of the Gaussian expert covariance matrices. We introduce a penalized maximum likelihood selection criterion to estimate the unknown conditional density of the regression model. This model selection criterion allows us to handle the challenging problem of inferring the number of mixture components, the degree of polynomial mean functions, and the hidden block-diagonal structures of the covariance matrices, which reduces the number of parameters to be estimated and leads to a trade-off between complexity and sparsity in the model. In particular, we provide a strong theoretical guarantee: a finite-sample oracle inequality satisfied by the penalized maximum likelihood estimator with a Jensen-Kullback-Leibler type loss, to support the introduced non-asymptotic model selection criterion. The penalty shape of this criterion depends on the complexity of the considered random subcollection of BLoMPE models, including the relevant graph structures, the degree of polynomial mean functions, and the number of mixture components.


page 1

page 2

page 3

page 4


A non-asymptotic penalization criterion for model selection in mixture of experts models

Mixture of experts (MoE) is a popular class of models in statistics and ...

A Parsimonious Tour of Bayesian Model Uncertainty

Modern statistical software and machine learning libraries are enabling ...

The folded concave Laplacian spectral penalty learns block diagonal sparsity patterns with the strong oracle property

Structured sparsity is an important part of the modern statistical toolk...

The Loss Rank Principle for Model Selection

We introduce a new principle for model selection in regression and class...

An Introduction to the Practical and Theoretical Aspects of Mixture-of-Experts Modeling

Mixture-of-experts (MoE) models are a powerful paradigm for modeling of ...

Incremental Learning for Fully Unsupervised Word Segmentation Using Penalized Likelihood and Model Selection

We present a novel incremental learning approach for unsupervised word s...

Topological Techniques in Model Selection

The LASSO is an attractive regularisation method for linear regression t...

Please sign up or login with your details

Forgot password? Click here to reset