Abstract
Interest in stream monitoring is shifting toward the distributed case. In many applications the data is high volume, dynamic, and distributed, making it infeasible to collect the distinct streams to a central node for processing. Often, the monitoring problem consists of determining whether the value of a global function, defined on the union of all streams, crossed a certain threshold. We wish to reduce communication by transforming the global monitoring to the testing of local constraints, checked independently at the nodes. Geometric monitoring (GM) proved useful for constructing such local constraints for general functions. Alas, in GM the constraints at all nodes share an identical structure and are thus unsuitable for handling heterogeneous streams. Therefore, we propose a general approach for monitoring heterogeneous streams (HGM), which defines constraints tailored to fit the data distributions at the nodes. While we prove that optimally selecting the constraints is NP-hard, we provide a practical solution, which reduces the running time by hierarchically clustering nodes with similar data distributions and then solving simpler optimization problems. We also present a method for efficiently recovering from local violations at the nodes. Experiments yield an improvement of over an order of magnitude in communication relative to GM.
Original language | English |
---|---|
Article number | 6678505 |
Pages (from-to) | 1890-1903 |
Number of pages | 14 |
Journal | IEEE Transactions on Knowledge and Data Engineering |
Volume | 26 |
Issue number | 8 |
DOIs | |
State | Published - Aug 2014 |
Keywords
- Heterogeneous data streams
- data modeling
- distributed streams
- geometric monitoring
- safe zones
ASJC Scopus subject areas
- Information Systems
- Computer Science Applications
- Computational Theory and Mathematics