A Direct Sum Result for the Information Complexity of Learning

04/16/2018
by   Ido Nachum, et al.
0

How many bits of information are required to PAC learn a class of hypotheses of VC dimension d? The mathematical setting we follow is that of Bassily et al. (2018), where the value of interest is the mutual information I(S;A(S)) between the input sample S and the hypothesis outputted by the learning algorithm A. We introduce a class of functions of VC dimension d over the domain X with information complexity at least Ω(d|X|/d) bits for any consistent and proper algorithm (deterministic or random). Bassily et al. proved a similar (but quantitatively weaker) result for the case d=1. The above result is in fact a special case of a more general phenomenon we explore. We define the notion of information complexity of a given class of functions H. Intuitively, it is the minimum amount of information that an algorithm for H must retain about its input to ensure consistency and properness. We prove a direct sum result for information complexity in this context; roughly speaking, the information complexity sums when combining several classes.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro