A generic local algorithm for mining data streams in large distributed systems

Ran Wolff, Kanishka Bhaduri, Hillol Kargupta

Research output: Contribution to journalArticlepeer-review

Abstract

In a large network of computers or wireless sensors, each of the components (henceforth, peers) has some data about the global state of the system. Much of the system's functionality such as message routing, information retrieval and load sharing relies on modeling the global state. We refer to the outcome of the function (e.g., the load experienced by each peer) as the \emph{model} of the system. Since the state of the system is constantly changing, it is necessary to keep the models up-to-date. Computing global data mining models e.g. decision trees, $k$-means clustering in large distributed systems may be very costly due to the scale of the system and due to communication cost, which may be high. The cost further increases in a dynamic scenario when the data changes rapidly. In this paper we describe a two step approach for dealing with these costs. First, we describe a highly efficient local algorithm which can be used to monitor a wide class of data mining models. Then, we use this algorithm as a feedback loop for the monitoring of complex functions of the data such as its k-means clustering. The theoretical claims are corroborated with a thorough experimental analysis.

Original languageEnglish
Article number4604665
Pages (from-to)465-478
Number of pages14
JournalIEEE Transactions on Knowledge and Data Engineering
Volume21
Issue number4
DOIs
StatePublished - Apr 2009

Bibliographical note

Funding Information:
A preliminary version of this work was published in the Proceedings of the 2006 SIAM Data Mining Conference (SDM ’06). This work was done when Kanishka Bhaduri was at UMBC. This research was supported by the US National Science Foundation CAREER Award IIS-0093353 and NASA Grant NNX07AV70G.

Keywords

  • Distributed data mining
  • Local algorithms
  • Peer-to-peer

ASJC Scopus subject areas

  • Information Systems
  • Computer Science Applications
  • Computational Theory and Mathematics

Fingerprint

Dive into the research topics of 'A generic local algorithm for mining data streams in large distributed systems'. Together they form a unique fingerprint.

Cite this