Coreset for Line-Sets Clustering

Sagi Lotan, Ernesto Evgeniy Sanches Shayda, Dan Feldman

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The input to the line-sets k-median problem is an integer k ≥ 1, and a set L = {L1,..., Ln} that contains n sets of lines in Rd. The goal is to compute a set C of k centers (points in Rd) that minimizes the sum ΣL∈L min∈L,c∈C dist(ℓ, c) of Euclidean distances from each set to its closest center, where dist(ℓ, c):= minx ∥x - c∥2. An ε-coreset for this problem is a weighted subset of sets in L that approximates this sum up to 1 ± ε multiplicative factor, for every set C of k centers. We prove that every such input set L has a small ε-coreset, and provide the first coreset construction for this problem and its variants. The coreset consists of O(log2 n) weighted line-sets from L, and is constructed in O(n log n) time for every fixed d, k ≥ 1 and ε ∈ (0, 1). The main technique is based on a novel reduction to a “fair clustering” of colored points to colored centers. We then provide a coreset for this coloring problem, which may be of independent interest. Open source code and experiments are also provided.

Original languageEnglish
Title of host publicationAdvances in Neural Information Processing Systems 35 - 36th Conference on Neural Information Processing Systems, NeurIPS 2022
EditorsS. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, A. Oh
PublisherNeural information processing systems foundation
ISBN (Electronic)9781713871088
StatePublished - 2022
Event36th Conference on Neural Information Processing Systems, NeurIPS 2022 - New Orleans, United States
Duration: 28 Nov 20229 Dec 2022

Publication series

NameAdvances in Neural Information Processing Systems
Volume35
ISSN (Print)1049-5258

Conference

Conference36th Conference on Neural Information Processing Systems, NeurIPS 2022
Country/TerritoryUnited States
CityNew Orleans
Period28/11/229/12/22

Bibliographical note

Publisher Copyright:
© 2022 Neural information processing systems foundation. All rights reserved.

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing

Fingerprint

Dive into the research topics of 'Coreset for Line-Sets Clustering'. Together they form a unique fingerprint.

Cite this