New Coresets for Projective Clustering and Applications

Murad Tukan, Xuan Wu, Samson Zhou, Vladimir Braverman, Dan Feldman

Research output: Contribution to journalConference articlepeer-review

Abstract

(j, k)-projective clustering is the natural generalization of the family of k-clustering and j-subspace clustering problems. Given a set of points P in Rd, the goal is to find k flats of dimension j, i.e., affine subspaces, that best fit P under a given distance measure. In this paper, we propose the first algorithm that returns an L coreset of size polynomial in d. Moreover, we give the first strong coreset construction for general M-estimator regression. Specifically, we show that our construction provides efficient coreset constructions for Cauchy, Welsch, Huber, Geman-McClure, Tukey, L1 − L2, and Fair regression, as well as general concave and power-bounded loss functions. Finally, we provide experimental results based on real-world datasets, showing the efficacy of our approach.

Original languageEnglish
Pages (from-to)5391-5415
Number of pages25
JournalProceedings of Machine Learning Research
Volume151
StatePublished - 2022
Event25th International Conference on Artificial Intelligence and Statistics, AISTATS 2022 - Virtual, Online, Spain
Duration: 28 Mar 202230 Mar 2022

Bibliographical note

Funding Information:
This research was partially supported by the Israel National Cyber Directorate via the BIU Center for Applied Research in Cyber Security, and supported in part by NSF CAREER grant 1652257, NSF grant 1934979, ONR Award N00014-18-1-2364 and the Lifelong Learning Machines program from DARPA/MTO. In addition, Samson Zhou would like to thank National Institute of Health grant 5401 HG 10798-2 and a Simons Investigator Award of David P. Woodruff.

Publisher Copyright:
Copyright © 2022 by the author(s)

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Statistics and Probability

Fingerprint

Dive into the research topics of 'New Coresets for Projective Clustering and Applications'. Together they form a unique fingerprint.

Cite this