TY - GEN

T1 - Data reduction for weighted and outlier-resistant clustering

AU - Feldman, Dan

AU - Schulman, Leonard J.

PY - 2012

Y1 - 2012

N2 - Statistical data frequently includes outliers; these can distort the results of estimation procedures and optimization problems. For this reason, loss functions which deemphasize the effect of outliers are widely used by statisticians. However, there are relatively few algorithmic results about clustering with outliers. For instance, the k-median with outliers problem uses a loss function fc1,...,ck (x) which is equal to the minimum of a penalty h, and the least distance between the data point x and a center c i. The loss-minimizing choice of {c1, . . . , c k} is an outlier-resistant clustering of the data. This problem is also a natural special case of the k-median with penalties problem considered by [Charikar, Khuller, Mount and Narasimhan SODA'01]. The essential challenge that arises in these optimization problems is data reduction for the weighted k-median problem. We solve this problem, which was previously solved only in one dimension ([Har-Peled FSTTCS'06], [Feldman, Fiat and Sharir FOCS'06]). As a corollary, we also achieve improved data reduction for the k-line-median problem.

AB - Statistical data frequently includes outliers; these can distort the results of estimation procedures and optimization problems. For this reason, loss functions which deemphasize the effect of outliers are widely used by statisticians. However, there are relatively few algorithmic results about clustering with outliers. For instance, the k-median with outliers problem uses a loss function fc1,...,ck (x) which is equal to the minimum of a penalty h, and the least distance between the data point x and a center c i. The loss-minimizing choice of {c1, . . . , c k} is an outlier-resistant clustering of the data. This problem is also a natural special case of the k-median with penalties problem considered by [Charikar, Khuller, Mount and Narasimhan SODA'01]. The essential challenge that arises in these optimization problems is data reduction for the weighted k-median problem. We solve this problem, which was previously solved only in one dimension ([Har-Peled FSTTCS'06], [Feldman, Fiat and Sharir FOCS'06]). As a corollary, we also achieve improved data reduction for the k-line-median problem.

UR - http://www.scopus.com/inward/record.url?scp=84860167138&partnerID=8YFLogxK

U2 - 10.1137/1.9781611973099.106

DO - 10.1137/1.9781611973099.106

M3 - Conference contribution

AN - SCOPUS:84860167138

SN - 9781611972108

T3 - Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms

SP - 1343

EP - 1354

BT - Proceedings of the 23rd Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2012

PB - Association for Computing Machinery

T2 - 23rd Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2012

Y2 - 17 January 2012 through 19 January 2012

ER -