TY - GEN
T1 - Association rule mining in peer-to-peer systems
AU - Wolff, Ran
AU - Schuster, Assaf
PY - 2003
Y1 - 2003
N2 - We extend the problem of association rule mining - a key data mining problem - to systems in which the database is partitioned among a very large number of computers that are dispersed over a wide area. Such computing systems include GRID computing platforms, federated database systems, and peer-to-peer computing environments. The scale of these systems poses several difficulties, such as the impracticality of global communications and global synchronization, dynamic topology changes of the network, on-the-fly data updates, the need to share resources with other applications, and the frequent failure and recovery of resources. We present an algorithm by which every node in the system can reach the exact solution, as if it were given the combined database. The algorithm is entirely asynchronous, imposes very little communication overhead, transparently tolerates network topology changes and node failures, and quickly adjusts to changes in the data as they occur. Simulation of up to 10,000 nodes show that the algorithm is local: all rules, except for those whose confidence is about equal to the confidence threshold, are discovered using information gathered from a very small vicinity, whose size is independent of the size of the system.
AB - We extend the problem of association rule mining - a key data mining problem - to systems in which the database is partitioned among a very large number of computers that are dispersed over a wide area. Such computing systems include GRID computing platforms, federated database systems, and peer-to-peer computing environments. The scale of these systems poses several difficulties, such as the impracticality of global communications and global synchronization, dynamic topology changes of the network, on-the-fly data updates, the need to share resources with other applications, and the frequent failure and recovery of resources. We present an algorithm by which every node in the system can reach the exact solution, as if it were given the combined database. The algorithm is entirely asynchronous, imposes very little communication overhead, transparently tolerates network topology changes and node failures, and quickly adjusts to changes in the data as they occur. Simulation of up to 10,000 nodes show that the algorithm is local: all rules, except for those whose confidence is about equal to the confidence threshold, are discovered using information gathered from a very small vicinity, whose size is independent of the size of the system.
UR - http://www.scopus.com/inward/record.url?scp=33751085446&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:33751085446
SN - 0769519784
SN - 9780769519784
T3 - Proceedings - IEEE International Conference on Data Mining, ICDM
SP - 363
EP - 370
BT - Proceedings - 3rd IEEE International Conference on Data Mining, ICDM 2003
T2 - 3rd IEEE International Conference on Data Mining, ICDM '03
Y2 - 19 November 2003 through 22 November 2003
ER -