Data reduction for weighted and outlier-resistant clustering

Dan Feldman, Leonard J. Schulman

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Statistical data frequently includes outliers; these can distort the results of estimation procedures and optimization problems. For this reason, loss functions which deemphasize the effect of outliers are widely used by statisticians. However, there are relatively few algorithmic results about clustering with outliers. For instance, the k-median with outliers problem uses a loss function fc1,...,ck (x) which is equal to the minimum of a penalty h, and the least distance between the data point x and a center c i. The loss-minimizing choice of {c1, . . . , c k} is an outlier-resistant clustering of the data. This problem is also a natural special case of the k-median with penalties problem considered by [Charikar, Khuller, Mount and Narasimhan SODA'01]. The essential challenge that arises in these optimization problems is data reduction for the weighted k-median problem. We solve this problem, which was previously solved only in one dimension ([Har-Peled FSTTCS'06], [Feldman, Fiat and Sharir FOCS'06]). As a corollary, we also achieve improved data reduction for the k-line-median problem.

Original languageEnglish
Title of host publicationProceedings of the 23rd Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2012
PublisherAssociation for Computing Machinery
Pages1343-1354
Number of pages12
ISBN (Print)9781611972108
DOIs
StatePublished - 2012
Externally publishedYes
Event23rd Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2012 - Kyoto, Japan
Duration: 17 Jan 201219 Jan 2012

Publication series

NameProceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms

Conference

Conference23rd Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2012
Country/TerritoryJapan
CityKyoto
Period17/01/1219/01/12

ASJC Scopus subject areas

  • Software
  • General Mathematics

Fingerprint

Dive into the research topics of 'Data reduction for weighted and outlier-resistant clustering'. Together they form a unique fingerprint.

Cite this