Abstract
Emerging large-scale monitoring applications rely on continuous tracking of complex data-analysis queries over collections of massive, physically-distributed data streams. Thus, in addition to the space- and time-efficiency requirements of conventional streamprocessing (at each remote monitor site), effective solutions also need to guarantee communication efficiency (over the underlying communication network). The complexity of the monitored query adds to the difficulty of the problem - this is especially true for nonlinear queries (e.g., joins), where no obvious solutions exist for distributing the monitored condition across sites. The recently proposed geometric method, based on the notion of covering spheres, offers a generic methodology for splitting an arbitrary (non-linear) global condition into a collection of local site constraints, and has been applied tomassive distributed stream-monitoring tasks, achieving state-of-the-art performance. In this paper, we present a far more general geometric approach, based on the convex decomposition of an appropriate subset of the domain of the monitoring query, and formally prove that it is always guaranteed to perform at least as good as the covering spheres method. We analyze our approach and demonstrate its effectiveness for the important case of sketchbased approximate tracking for norm, range-aggregate, and joinaggregate queries, which have numerous applications in streaming data analysis. Experimental results on real-life data streams verify the superiority of our approach in practical settings, showing that it substantially outperforms the covering spheres method.
Original language | English |
---|---|
Title of host publication | Proceedings of the VLDB Endowment |
Editors | Ki-Joune Li, Christophe Claramunt, Simonas Saltenis |
Publisher | Association for Computing Machinery |
Pages | 545-556 |
Number of pages | 12 |
Volume | 8 |
Edition | 5 5 |
DOIs | |
State | Published - 2015 |
Event | 3rd Workshop on Spatio-Temporal Database Management, STDBM 2006, Co-located with the 32nd International Conference on Very Large Data Bases, VLDB 2006 - Seoul, Korea, Republic of Duration: 11 Sep 2006 → 11 Sep 2006 |
Conference
Conference | 3rd Workshop on Spatio-Temporal Database Management, STDBM 2006, Co-located with the 32nd International Conference on Very Large Data Bases, VLDB 2006 |
---|---|
Country/Territory | Korea, Republic of |
City | Seoul |
Period | 11/09/06 → 11/09/06 |
ASJC Scopus subject areas
- Computer Science (miscellaneous)
- General Computer Science