Uniform Deviation Bounds for Unbounded Loss Functions like k-Means
Uniform deviation bounds limit the difference between a model's expected loss and its loss on an empirical sample uniformly for all models in a learning problem. As such, they are a critical component to empirical risk minimization. In this paper, we provide a novel framework to obtain uniform deviation bounds for loss functions which are *unbounded*. In our main application, this allows us to obtain bounds for k-Means clustering under weak assumptions on the underlying distribution. If the fourth moment is bounded, we prove a rate of O(m^-1/2) compared to the previously known O(m^-1/4) rate. Furthermore, we show that the rate also depends on the kurtosis - the normalized fourth moment which measures the "tailedness" of a distribution. We further provide improved rates under progressively stronger assumptions, namely, bounded higher moments, subgaussianity and bounded support.
READ FULL TEXT