Coresets for vector summarization with applications to network graphs

Dan Feldman, Sedat Ozer, Daniela Rus

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

We provide a deterministic data summarization algorithm that approximates the mean p = n EpepP f a set P f n vectors in Rd, by a weighted mean p of a subset of 0(l/e) vectors, i.e., independent of both n and d. We prove that the squared Euclidean distance between p and p is at most e multiplied by the variance of P. We use this algorithm to maintain an approximated sum of vectors from an unbounded stream, using memory that is independent of d, and logarithmic in the n vectors seen so far. Our main application is to extract and represent in a compact way friend groups and activity summaries of users from underlying data exchanges. For example, in the case of mobile networks, we can use GPS traces to identify meetings; in the case of social networks, we can use information exchange to identify friend groups. Our algorithm provably identifies the Heavy Hitter entries in a proximity (adjacency) matrix. The Heavy Hitters can be used to extract and represent in a compact way friend groups and activity summaries of users from underlying data exchanges. We evaluate the algorithm on several large data sets.

Original languageEnglish
Title of host publication34th International Conference on Machine Learning, ICML 2017
PublisherInternational Machine Learning Society (IMLS)
Pages1847-1855
Number of pages9
ISBN (Electronic)9781510855144
StatePublished - 2017
Event34th International Conference on Machine Learning, ICML 2017 - Sydney, Australia
Duration: 6 Aug 201711 Aug 2017

Publication series

Name34th International Conference on Machine Learning, ICML 2017
Volume3

Conference

Conference34th International Conference on Machine Learning, ICML 2017
Country/TerritoryAustralia
CitySydney
Period6/08/1711/08/17

Bibliographical note

Publisher Copyright:
Copyright 2017 by the author(s).

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Human-Computer Interaction
  • Software

Fingerprint

Dive into the research topics of 'Coresets for vector summarization with applications to network graphs'. Together they form a unique fingerprint.

Cite this