TY - GEN

T1 - A PTAS for k-means clustering based on weak coresets

AU - Feldman, Dan

AU - Monemizadeh, Morteza

AU - Sohler, Christian

PY - 2007

Y1 - 2007

N2 - Given a point set P Rd the k-means clustering problem is to find a set C=(c1,...,ck) of k points and a partition of P into k clusters C1,...,Ck such that the sum of squared errors i=1k p Ci |p -ci |2 2 is minimized. For given centers this cost function is minimized byassigning points to the nearest center.The k-means cost function is probably the most widely used cost function in the area of clustering.In this paper we show that every unweighted point set P has a weak (, k)-coreset of size Poly(k,1/) for the k-means clustering problem, i.e. its size is independent of the cardinality |P| of the point set and the dimension d of the Euclidean space Rd. A weak coreset is a weighted set S P together with a set T such that T contains a (1+)-approximation for the optimal cluster centers from P and for every set of kcenters from T the cost of the centers for S is a (1±)-approximation of the cost for P.We apply our weak coreset to obtain a PTAS for the k-means clustering problem with running time O(nkd + d Poly(k/) + 2Õ(k/ε)).

AB - Given a point set P Rd the k-means clustering problem is to find a set C=(c1,...,ck) of k points and a partition of P into k clusters C1,...,Ck such that the sum of squared errors i=1k p Ci |p -ci |2 2 is minimized. For given centers this cost function is minimized byassigning points to the nearest center.The k-means cost function is probably the most widely used cost function in the area of clustering.In this paper we show that every unweighted point set P has a weak (, k)-coreset of size Poly(k,1/) for the k-means clustering problem, i.e. its size is independent of the cardinality |P| of the point set and the dimension d of the Euclidean space Rd. A weak coreset is a weighted set S P together with a set T such that T contains a (1+)-approximation for the optimal cluster centers from P and for every set of kcenters from T the cost of the centers for S is a (1±)-approximation of the cost for P.We apply our weak coreset to obtain a PTAS for the k-means clustering problem with running time O(nkd + d Poly(k/) + 2Õ(k/ε)).

KW - Approximation

KW - Coresets

KW - Geometric optimization

KW - K-mean

UR - http://www.scopus.com/inward/record.url?scp=35348830377&partnerID=8YFLogxK

U2 - 10.1145/1247069.1247072

DO - 10.1145/1247069.1247072

M3 - Conference contribution

AN - SCOPUS:35348830377

SN - 1595937056

SN - 9781595937056

T3 - Proceedings of the Annual Symposium on Computational Geometry

SP - 11

EP - 18

BT - Proceedings of the Twenty-third Annual Symposium on Computational Geometry, SCG'07

T2 - 23rd Annual Symposium on Computational Geometry, SCG'07

Y2 - 6 June 2007 through 8 June 2007

ER -