A Unified Approach to Coreset Learning

Alaa Maalouf, Gilad Eini, Ben Mussay, Dan Feldman, Margarita Osadchy

Research output: Contribution to journalArticlepeer-review

Abstract

Coreset of a given dataset and loss function is usually a small weighed set that approximates this loss for every query from a given set of queries. Coresets have shown to be very useful in many applications. However, coresets’ construction is done in a problem-dependent manner and it could take years to design and prove the correctness of a coreset for a specific family of queries. This could limit coresets’ use in practical applications. Moreover, small coresets provably do not exist for many problems. To address these limitations, we propose a generic, learning-based algorithm for construction of coresets. Our approach offers a new definition of coreset, which is a natural relaxation of the standard definition and aims at approximating the average loss of the original data over the queries. This allows us to use a learning paradigm to compute a small coreset of a given set of inputs with respect to a given loss function using a training set of queries. We derive formal guarantees for the proposed approach. Experimental evaluation on deep networks and classic machine learning problems show that our learned coresets yield comparable or even better results than the existing algorithms with worst case theoretical guarantees (that may be too pessimistic in practice). Furthermore, our approach applied to deep network pruning provides the first coreset for a full deep network, i.e., compresses all the networks at once, and not layer by layer or similar divide-and-conquer methods.

Original languageEnglish
Pages (from-to)1-13
Number of pages13
JournalIEEE Transactions on Neural Networks and Learning Systems
Volume35
Issue number5
DOIs
StateAccepted/In press - 2022

Bibliographical note

Publisher Copyright:
IEEE

Keywords

  • Computational modeling
  • Coresets
  • Machine learning algorithms
  • Q measurement
  • Standards
  • Support vector machines
  • Training
  • Weight measurement
  • data summarization
  • generalization
  • learning

ASJC Scopus subject areas

  • Software
  • Computer Science Applications
  • Computer Networks and Communications
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'A Unified Approach to Coreset Learning'. Together they form a unique fingerprint.

Cite this