Scalable approximate query tracking over highly distributed data streams with tunable accuracy guarantees

Nikos Giatrakos, Antonios Deligiannakis, Minos Garofalakis, Daniel Keren, Vasilis Samoladas

Research output: Contribution to journalArticlepeer-review

Abstract

The recently proposed Geometric Monitoring (GM) method has provided a general tool for the distributed monitoring of arbitrary non-linear queries over streaming data observed by a collection of remote sites, with numerous practical applications. Unfortunately, GM-based techniques can suffer from serious scalability issues with increasing numbers of remote sites. In this paper, we propose novel techniques that effectively tackle the aforementioned scalability problems by exploiting a carefully designed sample of the remote sites for efficient approximate query tracking. Our novel sampling-based scheme utilizes a sample of cardinality proportional to N (compared to N for the original GM and its variants), where N is the number of sites in the network, to perform the monitoring process. Our extensive experimental evaluation and comparative analysis over a variety of real-life data streams demonstrates that our sampling-based techniques can significantly reduce the communication cost during distributed monitoring with controllable, predefined accuracy guarantees. In that, we manage to scale the monitoring of any given non-linear function on much higher network scales which had not been reached by any GM related method or variant so far.

Original languageEnglish
Pages (from-to)59-87
Number of pages29
JournalInformation Systems
Volume76
DOIs
StatePublished - Jul 2018

Bibliographical note

Funding Information:
This work was partially supported by the European Commission under the FP7 grant FERARI (no. 619491 ).

Publisher Copyright:
© 2018 Elsevier Ltd

Keywords

  • Data streams
  • Distributed function tracking
  • Sampling

ASJC Scopus subject areas

  • Software
  • Information Systems
  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'Scalable approximate query tracking over highly distributed data streams with tunable accuracy guarantees'. Together they form a unique fingerprint.

Cite this