Monitoring distributed data streams through node clustering

Maria Barouti, Daniel Keren, Jacob Kogan, Yaakov Malinovsky

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Monitoring data streams in a distributed system is a challenging problem with profound applications. The task of feature selection (e.g., by monitoring the information gain of various features) is an example of an application that requires special techniques to avoid a very high communication overhead when performed using straightforward centralized algorithms. Motivated by recent contributions based on geometric ideas, we present an alternative approach that combines system theory techniques and clustering. The proposed approach enables monitoring values of an arbitrary threshold function over distributed data streams through a set of constraints applied independently on each stream and/or clusters of streams. The clusters are designed to adapt themselves to the data stream. A correct choice of clusters yields a reduction in communication load. Unlike many clustering algorithms that attempt to collect together similar data items, monitoring requires clusters with dissimilar vectors canceling each other as much as possible. In particular, sub-clusters of a good cluster do not have to be good. This novel type of clustering dictated by the problem at hand requires development of new algorithms, and the paper is a step in this direction. We report experiments on real-world data that detect instances where communication between nodes is required, and show that the clustering approach reduces communication load.

Original languageEnglish
Title of host publicationMachine Learning and Data Mining in Pattern Recognition - 10th International Conference, MLDM 2014, Proceedings
PublisherSpringer Verlag
Pages149-162
Number of pages14
ISBN (Print)9783319089782
DOIs
StatePublished - 2014
Event10th International Conference on Machine Learning and Data Mining in Pattern Recognition, MLDM 2014 - St. Petersburg, Russian Federation
Duration: 21 Jul 201424 Jul 2014

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume8556 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference10th International Conference on Machine Learning and Data Mining in Pattern Recognition, MLDM 2014
Country/TerritoryRussian Federation
CitySt. Petersburg
Period21/07/1424/07/14

Keywords

  • clustering
  • convex analysis
  • data streams
  • distributed system

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Monitoring distributed data streams through node clustering'. Together they form a unique fingerprint.

Cite this