Convergence Rates for Mixture-of-Experts

by   Eduardo F. Mendes, et al.

In mixtures-of-experts (ME) model, where a number of submodels (experts) are combined, there have been two longstanding problems: (i) how many experts should be chosen, given the size of the training data? (ii) given the total number of parameters, is it better to use a few very complex experts, or is it better to combine many simple experts? In this paper, we try to provide some insights to these problems through a theoretic study on a ME structure where m experts are mixed, with each expert being related to a polynomial regression model of order k. We study the convergence rate of the maximum likelihood estimator (MLE), in terms of how fast the Kullback-Leibler divergence of the estimated density converges to the true density, when the sample size n increases. The convergence rate is found to be dependent on both m and k, and certain choices of m and k are found to produce optimal convergence rates. Therefore, these results shed light on the two aforementioned important problems: on how to choose m, and on how m and k should be compromised, for achieving good convergence rates.


page 1

page 2

page 3

page 4


Convergence Rates for Gaussian Mixtures of Experts

We provide a theoretical treatment of over-specified Gaussian mixtures o...

Optimal Bayesian estimation of Gaussian mixtures with growing number of components

We study posterior concentration properties of Bayesian procedures for e...

Convergence Rates of Latent Topic Models Under Relaxed Identifiability Conditions

In this paper we study the frequentist convergence rate for the Latent D...

Spectral Gap of Replica Exchange Langevin Diffusion on Mixture Distributions

Langevin diffusion (LD) is one of the main workhorses for sampling probl...

Convergence rates for pretraining and dropout: Guiding learning parameters using network structure

Unsupervised pretraining and dropout have been well studied, especially ...

Bagged Polynomial Regression and Neural Networks

Series and polynomial regression are able to approximate the same functi...

Please sign up or login with your details

Forgot password? Click here to reset