Near-optimal-sample estimators for spherical Gaussian mixtures
Statistical and machine-learning algorithms are frequently applied to high-dimensional data. In many of these applications data is scarce, and often much more costly than computation time. We provide the first sample-efficient polynomial-time estimator for high-dimensional spherical Gaussian mixtures. For mixtures of any k d-dimensional spherical Gaussians, we derive an intuitive spectral-estimator that uses O_k(d^2d/ϵ^4) samples and runs in time O_k,ϵ(d^3^5 d), both significantly lower than previously known. The constant factor O_k is polynomial for sample complexity and is exponential for the time complexity, again much smaller than what was previously known. We also show that Ω_k(d/ϵ^2) samples are needed for any algorithm. Hence the sample complexity is near-optimal in the number of dimensions. We also derive a simple estimator for one-dimensional mixtures that uses O(k k/ϵ/ϵ^2) samples and runs in time O((k/ϵ)^3k+1). Our other technical contributions include a faster algorithm for choosing a density estimate from a set of distributions, that minimizes the ℓ_1 distance to an unknown underlying distribution.
READ FULL TEXT