TY - GEN
T1 - Coresets and sketches for high dimensional subspace approximation problems
AU - Feldman, Dan
AU - Monemizadeh, Morteza
AU - Sohler, Christian
AU - Woodruff, David P.
PY - 2010
Y1 - 2010
N2 - We consider the problem of approximating a set P of n points in ℝd by a j-dimensional subspace under the ℓp, measure, in which we wish to minimize the sum of ℓp, distances from each point of P to this subspace. More generally, the Fq (ℓp)-subspace approximation problem asks for a j-subspace that minimizes the sum of qth powers of ℓp-distances to this subspace, up to a multiplicative factor of (1 + ∈e). We develop techniques for subspace approximation, regression, and matrix approximation that can be used to deal with massive data sets in high dimensional spaces. In particular, we develop coresets and sketches, i.e. small space representations that approximate the input point set P with respect to the subspace approximation problem. Our results are: • A dimensionality reduction method that can be applied to Fq (ℓp)-clustering and shape fitting problems, such as those in [8, 15]. • The first strong coreset for F1 (ℓ2)- subspace approximation in high-dimensional spaces, i.e. of size polynomial in the dimension of the space. This coreset approximates the distances to any j-subspace (not just the optimal one). • A (1 + ∈)-approximation algorithm for the j-dimensional F1 (ℓ2)-subspace approximation problem with running time nd(j/∈)O(1) + (n + d)2poly(j/∈). • A streaming algorithm that maintains a coreset for the F1 (ℓ2)-subspace approximation problem and uses a space of d (2√log n/∈2)poly(j) (weighted) points. • Streaming algorithms for the above problems with bounded precision in the turnstile model, i.e, when coordinates appear in an arbitrary order and undergo multiple updates. We show that bounded precision can lead to further improvements. We extend results of [7] for approximate linear regression, distances to subspace approximation, and optimal rank-j approximation, to error measures other than the Frobenius norm.
AB - We consider the problem of approximating a set P of n points in ℝd by a j-dimensional subspace under the ℓp, measure, in which we wish to minimize the sum of ℓp, distances from each point of P to this subspace. More generally, the Fq (ℓp)-subspace approximation problem asks for a j-subspace that minimizes the sum of qth powers of ℓp-distances to this subspace, up to a multiplicative factor of (1 + ∈e). We develop techniques for subspace approximation, regression, and matrix approximation that can be used to deal with massive data sets in high dimensional spaces. In particular, we develop coresets and sketches, i.e. small space representations that approximate the input point set P with respect to the subspace approximation problem. Our results are: • A dimensionality reduction method that can be applied to Fq (ℓp)-clustering and shape fitting problems, such as those in [8, 15]. • The first strong coreset for F1 (ℓ2)- subspace approximation in high-dimensional spaces, i.e. of size polynomial in the dimension of the space. This coreset approximates the distances to any j-subspace (not just the optimal one). • A (1 + ∈)-approximation algorithm for the j-dimensional F1 (ℓ2)-subspace approximation problem with running time nd(j/∈)O(1) + (n + d)2poly(j/∈). • A streaming algorithm that maintains a coreset for the F1 (ℓ2)-subspace approximation problem and uses a space of d (2√log n/∈2)poly(j) (weighted) points. • Streaming algorithms for the above problems with bounded precision in the turnstile model, i.e, when coordinates appear in an arbitrary order and undergo multiple updates. We show that bounded precision can lead to further improvements. We extend results of [7] for approximate linear regression, distances to subspace approximation, and optimal rank-j approximation, to error measures other than the Frobenius norm.
UR - http://www.scopus.com/inward/record.url?scp=77951676610&partnerID=8YFLogxK
U2 - 10.1137/1.9781611973075.53
DO - 10.1137/1.9781611973075.53
M3 - Conference contribution
AN - SCOPUS:77951676610
SN - 9780898717013
T3 - Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms
SP - 630
EP - 649
BT - Proceedings of the 21st Annual ACM-SIAM Symposium on Discrete Algorithms
PB - Association for Computing Machinery (ACM)
T2 - 21st Annual ACM-SIAM Symposium on Discrete Algorithms
Y2 - 17 January 2010 through 19 January 2010
ER -