TY - GEN
T1 - A geometric approach to monitoring threshold functions over distributed data streams
AU - Sharfman, Izchak
AU - Schuster, Assaf
AU - Keren, Daniel
PY - 2006
Y1 - 2006
N2 - Monitoring data streams in a distributed system is the focus of much research in recent years. Most of the proposed schemes, however, deal with monitoring simple aggregated values, such as the frequency of appearance of items in the streams. More involved challenges, such as the important task of feature selection (e.g., by monitoring the information gain of various features), still require very high communication overhead using naive, centralized algorithms. We present a novel geometric approach by which an arbitrary global monitoring task can be split into a set of constraints applied locally on each of the streams. The constraints are used to locally filter out data increments that do not affect the monitoring outcome, thus avoiding unnecessary communication. As a result, our approach enables monitoring of arbitrary threshold functions over distributed data streams in an efficient manner. We present experimental results on real-world data which demonstrate that our algorithms are highly scalable, and considerably reduce communication load in comparison to centralized algorithms.
AB - Monitoring data streams in a distributed system is the focus of much research in recent years. Most of the proposed schemes, however, deal with monitoring simple aggregated values, such as the frequency of appearance of items in the streams. More involved challenges, such as the important task of feature selection (e.g., by monitoring the information gain of various features), still require very high communication overhead using naive, centralized algorithms. We present a novel geometric approach by which an arbitrary global monitoring task can be split into a set of constraints applied locally on each of the streams. The constraints are used to locally filter out data increments that do not affect the monitoring outcome, thus avoiding unnecessary communication. As a result, our approach enables monitoring of arbitrary threshold functions over distributed data streams in an efficient manner. We present experimental results on real-world data which demonstrate that our algorithms are highly scalable, and considerably reduce communication load in comparison to centralized algorithms.
KW - Data streams
KW - Distributed monitoring
UR - http://www.scopus.com/inward/record.url?scp=34250689976&partnerID=8YFLogxK
U2 - 10.1145/1142473.1142508
DO - 10.1145/1142473.1142508
M3 - Conference contribution
AN - SCOPUS:34250689976
SN - 1595934340
SN - 9781595934345
T3 - Proceedings of the ACM SIGMOD International Conference on Management of Data
SP - 301
EP - 312
BT - SIGMOD 2006 - Proceedings of the ACM SIGMOD International Conference on Management of Data
T2 - 2006 ACM SIGMOD International Conference on Management of Data
Y2 - 27 June 2006 through 29 June 2006
ER -