Research output per year
Research output per year
Dan Feldman, Melanie Schmidt, Christian Sohler
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review
We prove that the sum of the squared Euclidean distances from the n rows of an n x d matrix A to any compact set that is spanned by k vectors in double-struck R^{d} can be approximated up to (1 + ε)-factor, for an arbitrary small ε > 0, using the O(k/ε^{2})-rank approximation of A and a constant. This implies, for example, that the optimal k-means clustering of the rows of A is (1 + ε)-approximated by an optimal k-means clustering of their projection on the O(k/ε^{2}) first right singular vectors (principle components) of A. A (j, k)-coreset for projective clustering is a small set of points that yields a (1 + ε)-approximation to the sum of squared distances from the n rows of A to any set of k affine subspaces, each of dimension at most j. Our embedding yields (0, k)-coresets of size script O(k) for handling k-means queries, (j, 1)-coresets of size script O(j) for PCA queries, and (j, k)-coresets of size (log n)^{script O(jk)} for any j,k ≥ 1 and constant ε ∈ (0, 1/2). Previous coresets usually have a size which is linearly or even exponentially dependent of d, which makes them useless when d ∼ n. Using our coresets with the merge-and-reduce approach, we obtain embarrassingly parallel streaming algorithms for problems such as k-means, PCA and projective clustering. These algorithms use update time per point and memory that is polynomial in log n and only linear in d. For cost functions other than squared Euclidean distances we suggest a simple recursive coreset construction that produces coresets of size k^{1/εscript O(1)} for k-means and a special class of bregman divergences that is less dependent on the properties of the squared Euclidean distance.
Original language | English |
---|---|
Title of host publication | Proceedings of the 24th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2013 |
Publisher | Association for Computing Machinery |
Pages | 1434-1453 |
Number of pages | 20 |
ISBN (Print) | 9781611972511 |
DOIs | |
State | Published - 2013 |
Externally published | Yes |
Event | 24th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2013 - New Orleans, LA, United States Duration: 6 Jan 2013 → 8 Jan 2013 |
Name | Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms |
---|
Conference | 24th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2013 |
---|---|
Country/Territory | United States |
City | New Orleans, LA |
Period | 6/01/13 → 8/01/13 |
Research output: Contribution to journal › Article › peer-review