TY - GEN
T1 - Data reduction for weighted and outlier-resistant clustering
AU - Feldman, Dan
AU - Schulman, Leonard J.
PY - 2012
Y1 - 2012
N2 - Statistical data frequently includes outliers; these can distort the results of estimation procedures and optimization problems. For this reason, loss functions which deemphasize the effect of outliers are widely used by statisticians. However, there are relatively few algorithmic results about clustering with outliers. For instance, the k-median with outliers problem uses a loss function fc1,...,ck (x) which is equal to the minimum of a penalty h, and the least distance between the data point x and a center c i. The loss-minimizing choice of {c1, . . . , c k} is an outlier-resistant clustering of the data. This problem is also a natural special case of the k-median with penalties problem considered by [Charikar, Khuller, Mount and Narasimhan SODA'01]. The essential challenge that arises in these optimization problems is data reduction for the weighted k-median problem. We solve this problem, which was previously solved only in one dimension ([Har-Peled FSTTCS'06], [Feldman, Fiat and Sharir FOCS'06]). As a corollary, we also achieve improved data reduction for the k-line-median problem.
AB - Statistical data frequently includes outliers; these can distort the results of estimation procedures and optimization problems. For this reason, loss functions which deemphasize the effect of outliers are widely used by statisticians. However, there are relatively few algorithmic results about clustering with outliers. For instance, the k-median with outliers problem uses a loss function fc1,...,ck (x) which is equal to the minimum of a penalty h, and the least distance between the data point x and a center c i. The loss-minimizing choice of {c1, . . . , c k} is an outlier-resistant clustering of the data. This problem is also a natural special case of the k-median with penalties problem considered by [Charikar, Khuller, Mount and Narasimhan SODA'01]. The essential challenge that arises in these optimization problems is data reduction for the weighted k-median problem. We solve this problem, which was previously solved only in one dimension ([Har-Peled FSTTCS'06], [Feldman, Fiat and Sharir FOCS'06]). As a corollary, we also achieve improved data reduction for the k-line-median problem.
UR - http://www.scopus.com/inward/record.url?scp=84860167138&partnerID=8YFLogxK
U2 - 10.1137/1.9781611973099.106
DO - 10.1137/1.9781611973099.106
M3 - Conference contribution
AN - SCOPUS:84860167138
SN - 9781611972108
T3 - Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms
SP - 1343
EP - 1354
BT - Proceedings of the 23rd Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2012
PB - Association for Computing Machinery
T2 - 23rd Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2012
Y2 - 17 January 2012 through 19 January 2012
ER -