Communication-efficient distributed mining of association rules

A. Schuster, R. Wolff

Research output: Contribution to journalConference articlepeer-review


Mining for associations between items in large transactional databases is a central problem in the field of knowledge discovery. When the database is partitioned among several share-nothing machines, the problem can be addressed using distributed data mining algorithms. One such algorithm, called CD, was proposed by Agrawal and Sharer in [1] and was later enhanced by the FDM algorithm of Cheung, Hun et al. [5]. The main problem with these algorithms is that they do not scale well with the number of partitions. They are thus impractical for use in modern distributed environments such as peer-to-peer systems, in which hundreds or thousands of computers may interact. In this paper we present a set of new algorithms that solve the Distributed Association Rule Mining problem using far less communication. In addition to being very efficient, the new algorithms are also extremely robust. Unlike existing algorithms, they continue to be efficient even when the data is skewed or the partition sizes are imbalanced. We present both experimental and theoretical results concerning the behavior of these algorithms and explain how they can be implemented in different settings.

Original languageEnglish
Pages (from-to)473-484
Number of pages12
JournalProceedings of the ACM SIGMOD International Conference on Management of Data
StatePublished - 2001
Externally publishedYes
Event2001 ACM SIGMOD International Conference on Management of Data - Santa Barbara, CA, United States
Duration: 21 May 200124 May 2001

ASJC Scopus subject areas

  • Software
  • Information Systems


Dive into the research topics of 'Communication-efficient distributed mining of association rules'. Together they form a unique fingerprint.

Cite this