A high-performance distributed algorithm for mining association rules

Assaf Schuster, Ran Wolff, Dan Trock

Research output: Contribution to journalArticlepeer-review

Abstract

We present a new distributed association rule mining (D-ARM) algorithm that demonstrates superlinear speed-up with the number of computing nodes. The algorithm is the first D-ARM algorithm to perform a single scan over the database. As such, its performance is unmatched by any previous algorithm. Scale-up experiments over standard synthetic benchmarks demonstrate stable run time regardless of the number of computers. Theoretical analysis reveals a tighter bound on error probability than the one shown in the corresponding sequential algorithm. As a result of this tighter bound and by utilizing the combined memory of several computers, the algorithm generates far fewer candidates than comparable sequential algorithms-the same order of magnitude as the optimum. Springer-Verlag London Ltd.

Original languageEnglish
Pages (from-to)458-475
Number of pages18
JournalKnowledge and Information Systems
Volume7
Issue number4
DOIs
StatePublished - May 2005
Externally publishedYes

Keywords

  • Association rule
  • Data mining
  • Distributed data mining
  • High-performance computing

ASJC Scopus subject areas

  • Software
  • Information Systems
  • Human-Computer Interaction
  • Hardware and Architecture
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'A high-performance distributed algorithm for mining association rules'. Together they form a unique fingerprint.

Cite this