Distributed threshold querying of general functions by a difference of monotonic representation

Guy Sagy, Daniel Keren, Izchak Sharfman, Assaf Schuster

Research output: Contribution to journalArticlepeer-review

Abstract

The goal of a threshold query is to detect all objects whose score exceeds a given threshold. This type of query is used in many settings, such as data mining, event triggering, and top-k selection. Often, threshold queries are performed over distributed data. Given database relations that are distributed over many nodes, an object's score is computed by aggregating the value of each attribute, applying a given scoring function over the aggregation, and thresholding the function's value. However, joining all the distributed relations to a central database might incur prohibitive overheads in bandwidth, CPU, and storage accesses. Efficient algorithms required to reduce these costs exist only for monotonic aggregation threshold queries and certain specific scoring functions. We present a novel approach for efficiently performing general distributed threshold queries. To the best of our knowledge, this is the first solution to the problem of performing such queries with general scoring functions. We first present a solution for monotonic functions, and then introduce a technique to solve for other functions by representing them as a difference of monotonic functions. Experiments with real-world data demonstrate the method's effectiveness in achieving low communication and access costs.

Original languageEnglish
Pages (from-to)46-57
Number of pages12
JournalProceedings of the VLDB Endowment
Volume4
Issue number2
DOIs
StatePublished - Nov 2010

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • General Computer Science

Fingerprint

Dive into the research topics of 'Distributed threshold querying of general functions by a difference of monotonic representation'. Together they form a unique fingerprint.

Cite this