Monitoring least squares models of distributed streams

Moshe Gabel, Daniel Keren, Assaf Schuster

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Least squares regression is widely used to understand and predict data behavior in many fields. As data evolves, regression models must be recomputed, and indeed much work has focused on quick, efficient and accurate computation of linear regression models. In distributed streaming settings, however, periodically recomputing the global model is wasteful: communicating new observations or model updates is required even when the model is, in practice, unchanged. This is prohibitive in many settings, such as in wireless sensor networks, or when the number of nodes is very large. The alternative, monitoring prediction accuracy, is not always sufficient: in some settings, for example, we are interested in the model's coefficients, rather than its predictions. We propose the first monitoring algorithm for multivariate regression models of distributed data streams that guarantees a bounded model error. It maintains an accurate estimate using a fraction of the communication by recomputing only when the precomputed model is sufficiently far from the (hypothetical) current global model. When the global model is stable, no communication is needed. Experiments on real and synthetic datasets show that our approach reduces communication by up to two orders of magnitude while providing an accurate estimate of the current global model in all nodes.

Original languageEnglish
Title of host publicationKDD 2015 - Proceedings of the 21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining
PublisherAssociation for Computing Machinery
Pages319-328
Number of pages10
ISBN (Electronic)9781450336642
DOIs
StatePublished - 10 Aug 2015
Event21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2015 - Sydney, Australia
Duration: 10 Aug 201513 Aug 2015

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Volume2015-August

Conference

Conference21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2015
Country/TerritoryAustralia
CitySydney
Period10/08/1513/08/15

Keywords

  • Data mining
  • Distributed streams
  • Least squares
  • Regression

ASJC Scopus subject areas

  • Software
  • Information Systems

Fingerprint

Dive into the research topics of 'Monitoring least squares models of distributed streams'. Together they form a unique fingerprint.

Cite this