Structural Risk Minimization for C^1,1(R^d) Regression
One means of fitting functions to high-dimensional data is by providing smoothness constraints. Recently, the following smooth function approximation problem was proposed by herbert2014computing: given a finite set E ⊂R^d and a function f: E →R, interpolate the given information with a function f∈Ċ^1, 1(R^d) (the class of first-order differentiable functions with Lipschitz gradients) such that f(a) = f(a) for all a ∈ E, and the value of Lip(∇f) is minimal. An algorithm is provided that constructs such an approximating function f and estimates the optimal Lipschitz constant Lip(∇f) in the noiseless setting. We address statistical aspects of reconstructing the approximating function f from a closely-related class C^1, 1(R^d) given samples from noisy data. We observe independent and identically distributed samples y(a) = f(a) + ξ(a) for a ∈ E, where ξ(a) is a noise term and the set E ⊂R^d is fixed and known. We obtain uniform bounds relating the empirical risk and true risk over the class F_M = {f ∈ C^1, 1(R^d) |Lip(∇ f) ≤M}, where the quantity M grows with the number of samples at a rate governed by the metric entropy of the class C^1, 1(R^d). Finally, we provide an implementation using Vaidya's algorithm, supporting our results via numerical experiments on simulated data.
READ FULL TEXT