A PTAS for k-means clustering based on weak coresets

Dan Feldman, Morteza Monemizadeh, Christian Sohler

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Given a point set P Rd the k-means clustering problem is to find a set C=(c1,...,ck) of k points and a partition of P into k clusters C1,...,Ck such that the sum of squared errors i=1k p Ci |p -ci |2 2 is minimized. For given centers this cost function is minimized byassigning points to the nearest center.The k-means cost function is probably the most widely used cost function in the area of clustering.In this paper we show that every unweighted point set P has a weak (, k)-coreset of size Poly(k,1/) for the k-means clustering problem, i.e. its size is independent of the cardinality |P| of the point set and the dimension d of the Euclidean space Rd. A weak coreset is a weighted set S P together with a set T such that T contains a (1+)-approximation for the optimal cluster centers from P and for every set of kcenters from T the cost of the centers for S is a (1±)-approximation of the cost for P.We apply our weak coreset to obtain a PTAS for the k-means clustering problem with running time O(nkd + d Poly(k/) + 2Õ(k/ε)).

Original languageEnglish
Title of host publicationProceedings of the Twenty-third Annual Symposium on Computational Geometry, SCG'07
Pages11-18
Number of pages8
DOIs
StatePublished - 2007
Externally publishedYes
Event23rd Annual Symposium on Computational Geometry, SCG'07 - Gyeongju, Korea, Republic of
Duration: 6 Jun 20078 Jun 2007

Publication series

NameProceedings of the Annual Symposium on Computational Geometry

Conference

Conference23rd Annual Symposium on Computational Geometry, SCG'07
Country/TerritoryKorea, Republic of
CityGyeongju
Period6/06/078/06/07

Keywords

  • Approximation
  • Coresets
  • Geometric optimization
  • K-mean

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Geometry and Topology
  • Computational Mathematics

Fingerprint

Dive into the research topics of 'A PTAS for k-means clustering based on weak coresets'. Together they form a unique fingerprint.

Cite this