A Direct Sum Result for the Information Complexity of Learning

Ido Nachum, Jonathan Shafer, Amir Yehudayoff

Research output: Contribution to journalConference articlepeer-review


How many bits of information are required to PAC learn a class of hypotheses of VC dimension d? The mathematical setting we follow is that of Bassily et al., where the value of interest is the mutual information I (S; A(S)) between the input sample S and the hypothesis outputted by the learning algorithm A. We introduce a class of functions of VC dimension d over the domain X with information complexity at least Ω ( d log log |Xd|) bits for any consistent and proper algorithm (deterministic or random). Bassily et al. proved a similar (but quantitatively weaker) result for the case d = 1. The above result is in fact a special case of a more general phenomenon we explore. We define the notion of information complexity of a given class of functions H. Intuitively, it is the minimum amount of information that an algorithm for H must retain about its input to ensure consistency and properness. We prove a direct sum result for information complexity in this context; roughly speaking, the information complexity sums when combining several classes.

Original languageEnglish
Pages (from-to)1547-1568
Number of pages22
JournalProceedings of Machine Learning Research
StatePublished - 2018
Externally publishedYes
Event31st Annual Conference on Learning Theory, COLT 2018 - Stockholm, Sweden
Duration: 6 Jul 20189 Jul 2018

Bibliographical note

Publisher Copyright:
© 2018 I. Nachum, J. Shafer & A. Yehudayoff.


  • Direct Sum
  • Information Theory
  • PAC Learning
  • VC Dimension

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Statistics and Probability


Dive into the research topics of 'A Direct Sum Result for the Information Complexity of Learning'. Together they form a unique fingerprint.

Cite this