Sketchbased geometric monitoring of distributed stream queries

Minos Garofalakis, Daniel Keren, Vasilis Samoladas

Research output: Contribution to journalConference articlepeer-review

Abstract

Emerging large-scale monitoring applications rely on continuous tracking of complex data-analysis queries over collections of massive, physically-distributed data streams. Thus, in addition to the space- and time-efficiency requirements of conventional stream processing (at each remote monitor site), effective solutions also need to guarantee communication efficiency (over the underlying communication network). The complexity of the monitored query adds to the difficulty of the problem - this is especially true for nonlinear queries (e.g., joins), where no obvious solutions exist for distributing the monitor condition across sites. The recently proposed geometric method offers a generic methodology for splitting an arbitrary (non-linear) global threshold-monitoring task into a collection of local site constraints; still, the approach relies on maintaining the complete stream(s) at each site, thus raising serious efficiency concerns for massive data streams. In this paper, we propose novel algorithms for efficiently tracking a broad class of complex aggregate queries in such distributed-streams settings. Our tracking schemes rely on a novel combination of the geometric method with compact sketch summaries of local data streams, and maintain approximate answers with provable error guarantees, while optimizing space and processing costs at each remote site and communication cost across the network. One of our key technical insights for the effective use of the geometric method lies in exploiting a much lower-dimensional space for monitoring the sketch-based estimation query. Due to the complex, highly nonlinear nature of these estimates, efficiently monitoring the local geometric constraints poses challenging algorithmic issues for which we propose novel solutions. Experimental results on real-life data streams verify the effectiveness of our approach.

Original languageEnglish
Pages (from-to)937-948
Number of pages12
JournalProceedings of the VLDB Endowment
Volume6
Issue number10
DOIs
StatePublished - Aug 2013
Event39th International Conference on Very Large Data Bases, VLDB 2012 - Trento, Italy
Duration: 26 Aug 201330 Aug 2013

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • General Computer Science

Fingerprint

Dive into the research topics of 'Sketchbased geometric monitoring of distributed stream queries'. Together they form a unique fingerprint.

Cite this