Abstract
Coreset (or core-set) is a small weighted subset Q of an input set P with respect to a given monotonic function f : R → R that provably approximates its fitting loss (Equation presented) to any given x ∈ Rd. Using Q we can obtain an approximation of x∗ that minimizes this loss, by running existing optimization algorithms on Q. In this work we provide: (i) A lower bound which proves that there are sets with no coresets smaller than n = |P | for general monotonic loss functions. (ii) A proof that, with an additional common regularization term and under a natural assumption that holds e.g. for logistic regression and the sigmoid activation functions, a small coreset exists for any input P. (iii) A generic coreset construction algorithm that computes such a small coreset Q in O(nd + n log n) time, and (iv) Experimental results with open-source code which demonstrate that our coresets are effective and are much smaller in practice than predicted in theory.
| Original language | English |
|---|---|
| Pages (from-to) | 21520-21547 |
| Number of pages | 28 |
| Journal | Proceedings of Machine Learning Research |
| Volume | 162 |
| State | Published - 2022 |
| Event | 39th International Conference on Machine Learning, ICML 2022 - Baltimore, United States Duration: 17 Jul 2022 → 23 Jul 2022 |
Bibliographical note
Publisher Copyright:Copyright © 2022 by the author(s)
ASJC Scopus subject areas
- Artificial Intelligence
- Software
- Control and Systems Engineering
- Statistics and Probability
Fingerprint
Dive into the research topics of 'Generic Coreset for Scalable Learning of Monotonic Kernels: Logistic Regression, Sigmoid and more'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver