Learning the Hypotheses Space from data Part I: Learning Space and U-curve Property

01/26/2020
by   Diego Marcondes, et al.
0

The agnostic PAC learning model consists of: a Hypothesis Space H, a probability distribution P, a sample complexity function m_H(ϵ,δ): [0,1]^2Z_+ of precision ϵ and confidence 1 - δ, a finite i.i.d. sample D_N, a cost function ℓ and a learning algorithm A(H,D_N), which estimates ĥ∈H that approximates a target function h^∈H seeking to minimize out-of-sample error. In this model, prior information is represented by H and ℓ, while problem solution is performed through their instantiation in several applied learning models, with specific algebraic structures for H and corresponding learning algorithms. However, these applied models use additional important concepts not covered by the classic PAC learning theory: model selection and regularization. This paper presents an extension of this model which covers these concepts. The main principle added is the selection, based solely on data, of a subspace of H with a VC-dimension compatible with the available sample. In order to formalize this principle, the concept of Learning Space L(H), which is a poset of subsets of H that covers H and satisfies a property regarding the VC dimension of related subspaces, is presented as the natural search space for model selection algorithms. A remarkable result obtained on this new framework are conditions on L(H) and ℓ that lead to estimated out-of-sample error surfaces, which are true U-curves on L(H) chains, enabling a more efficient search on L(H). Hence, in this new framework, the U-curve optimization problem becomes a natural component of model selection algorithms.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset
Success!
Error Icon An error occurred

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro