Abstract
We present a new distributed association rule mining (D-ARM) algorithm that demonstrates superlinear speed-up with the number of computing nodes. The algorithm is the first D-ARM algorithm to perform a single scan over the database. As such, its performance is unmatched by any previous algorithm. Scale-up experiments over standard synthetic benchmarks demonstrate stable run time regardless of the number of computers. Theoretical analysis reveals a tighter bound on error probability than the one shown in the corresponding sequential algorithm. As a result of this tighter bound and by utilizing the combined memory of several computers, the algorithm generates far fewer candidates than comparable sequential algorithms-the same order of magnitude as the optimum. Springer-Verlag London Ltd.
Original language | English |
---|---|
Pages (from-to) | 458-475 |
Number of pages | 18 |
Journal | Knowledge and Information Systems |
Volume | 7 |
Issue number | 4 |
DOIs | |
State | Published - May 2005 |
Externally published | Yes |
Keywords
- Association rule
- Data mining
- Distributed data mining
- High-performance computing
ASJC Scopus subject areas
- Software
- Information Systems
- Human-Computer Interaction
- Hardware and Architecture
- Artificial Intelligence