Clustering for monitoring distributed data streams

Maria Barouti, Jacob Kogan, Yaakov Malinovsky, Daniel Keren

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

Abstract

Monitoring data streams in a distributed system is a challenging problem with profound applications. The task of feature selection (e.g., by monitoring the information gain of various features) is an example of an application that requires special techniques to avoid a very high communication overhead when performed using straightforward centralized algorithms.Motivated by recent contributions based on geometric ideas, we present an alternative approach that combines system theory techniques and clustering. The proposed approach enables monitoring values of an arbitrary threshold function over distributed data streams through a set of constraints applied independently on each stream and/or clusters of streams. The clusters are designed to evolve in time and to adapt themselves to the data stream. A correct choice of clusters yields a reduction in communication load. Unlike many clustering algorithms that attempt to collect together similar data items, monitoring requires clusters with dissimilar vectors canceling each other as much as possible. In particular, sub–clusters of a good cluster do not have to be good. This novel type of clustering dictated by the problem at hand requires development of new algorithms and/or modification of the existing ones, and the chapter is a step in this direction.We report experiments on real–world data with a newly devised clustering algorithm. The experiments detect instances where communication between nodes is required, and show that the clustering approach reduces communication load. We then propose an application of the well known clustering algorithms to the monitoring problem.

Original languageEnglish
Title of host publicationPartitional Clustering Algorithms
PublisherSpringer International Publishing
Pages387-415
Number of pages29
ISBN (Electronic)9783319092591
ISBN (Print)9783319092584
DOIs
StatePublished - 1 Jan 2015

Bibliographical note

Publisher Copyright:
© Springer International Publishing Switzerland 2015.

Keywords

  • Adaptive streammining
  • Clustering
  • Data streams
  • Distributed system

ASJC Scopus subject areas

  • General Engineering

Fingerprint

Dive into the research topics of 'Clustering for monitoring distributed data streams'. Together they form a unique fingerprint.

Cite this