Dimensionality-Dependent Generalization Bounds for k-Dimensional Coding Schemes

by   Tongliang Liu, et al.

The k-dimensional coding schemes refer to a collection of methods that attempt to represent data using a set of representative k-dimensional vectors, and include non-negative matrix factorization, dictionary learning, sparse coding, k-means clustering and vector quantization as special cases. Previous generalization bounds for the reconstruction error of the k-dimensional coding schemes are mainly dimensionality independent. A major advantage of these bounds is that they can be used to analyze the generalization error when data is mapped into an infinite- or high-dimensional feature space. However, many applications use finite-dimensional data features. Can we obtain dimensionality-dependent generalization bounds for k-dimensional coding schemes that are tighter than dimensionality-independent bounds when data is in a finite-dimensional feature space? The answer is positive. In this paper, we address this problem and derive a dimensionality-dependent generalization bound for k-dimensional coding schemes by bounding the covering number of the loss function class induced by the reconstruction error. The bound is of order O((mk(mkn)/n)^λ_n), where m is the dimension of features, k is the number of the columns in the linear implementation of coding schemes, n is the size of sample, λ_n>0.5 when n is finite and λ_n=0.5 when n is infinite. We show that our bound can be tighter than previous results, because it avoids inducing the worst-case upper bound on k of the loss function and converges faster. The proposed generalization bound is also applied to some specific coding schemes to demonstrate that the dimensionality-dependent bound is an indispensable complement to these dimensionality-independent generalization bounds.


page 1

page 2

page 3

page 4


On the Sample Complexity of Predictive Sparse Coding

The goal of predictive sparse coding is to learn a representation of exa...

K-Dimensional Coding Schemes in Hilbert Spaces

This paper presents a general coding method where data in a Hilbert spac...

Learning finite-dimensional coding schemes with nonlinear reconstruction maps

This paper generalizes the Maurer--Pontil framework of finite-dimensiona...

High-dimensional Berry-Esseen Bound for m-Dependent Random Samples

In this work, we provide a (n/m)^-1/2-rate finite sample Berry-Esseen bo...

Risk bounds when learning infinitely many response functions by ordinary linear regression

Consider the problem of learning a large number of response functions si...

Independent finite approximations for Bayesian nonparametric inference: construction, error bounds, and practical implications

Bayesian nonparametrics based on completely random measures (CRMs) offer...

Escaping the Curse of Dimensionality in Similarity Learning: Efficient Frank-Wolfe Algorithm and Generalization Bounds

Similarity and metric learning provides a principled approach to constru...

Please sign up or login with your details

Forgot password? Click here to reset