TY - GEN
T1 - Local L2-thresholding based data mining in peer-to-peer systems
AU - Wolff, Ran
AU - Bhaduri, Kanishka
AU - Kargupta, Hillol
PY - 2006
Y1 - 2006
N2 - In a large network of computers, wireless sensors, or mobile devices, each of the components (hence, peers) has some data about the global status of the system. Many of the functions of the system, such as routing decisions, search strategies, data cleansing, and the assignment of mutual trust, depend on the global status. Therefore, it is essential that the system be able to detect, and react to, changes in its global status. Computing global predicates in such systems is usually very costly. Mainly because of their scale, and in some cases (e.g., sensor networks) also because of the high cost of communication. The cost further increases when the data changes rapidly (due to state changes, node failure, etc.) and computation has to follow these changes. In this paper we describe a two step approach for dealing with these costs. First, we describe a highly efficient local algorithm which detect when the L2 norm of the average data surpasses a threshold. Then, we use this algorithm as a feedback loop for the monitoring of complex predicates on the data - such as the data's k-means clustering. The efficiency of the L2 algorithm guarantees that so long as the clustering results represent the data (i.e., the data is stationary) few resources are required. When the data undergoes an epoch change - a change in the underlying distribution - and the model no longer represents it, the feedback loop indicates this and the model is rebuilt. Furthermore, the existence of a feedback loop allows using approximate and "best- effort" methods for constructing the model; if an ill-fit model is built the feedback loop would indicate so, and the model would be rebuilt.
AB - In a large network of computers, wireless sensors, or mobile devices, each of the components (hence, peers) has some data about the global status of the system. Many of the functions of the system, such as routing decisions, search strategies, data cleansing, and the assignment of mutual trust, depend on the global status. Therefore, it is essential that the system be able to detect, and react to, changes in its global status. Computing global predicates in such systems is usually very costly. Mainly because of their scale, and in some cases (e.g., sensor networks) also because of the high cost of communication. The cost further increases when the data changes rapidly (due to state changes, node failure, etc.) and computation has to follow these changes. In this paper we describe a two step approach for dealing with these costs. First, we describe a highly efficient local algorithm which detect when the L2 norm of the average data surpasses a threshold. Then, we use this algorithm as a feedback loop for the monitoring of complex predicates on the data - such as the data's k-means clustering. The efficiency of the L2 algorithm guarantees that so long as the clustering results represent the data (i.e., the data is stationary) few resources are required. When the data undergoes an epoch change - a change in the underlying distribution - and the model no longer represents it, the feedback loop indicates this and the model is rebuilt. Furthermore, the existence of a feedback loop allows using approximate and "best- effort" methods for constructing the model; if an ill-fit model is built the feedback loop would indicate so, and the model would be rebuilt.
UR - http://www.scopus.com/inward/record.url?scp=33745461062&partnerID=8YFLogxK
U2 - 10.1137/1.9781611972764.38
DO - 10.1137/1.9781611972764.38
M3 - Conference contribution
AN - SCOPUS:33745461062
SN - 089871611X
SN - 9780898716115
T3 - Proceedings of the Sixth SIAM International Conference on Data Mining
SP - 430
EP - 441
BT - Proceedings of the Sixth SIAM International Conference on Data Mining
PB - Society for Industrial and Applied Mathematics
T2 - Sixth SIAM International Conference on Data Mining
Y2 - 20 April 2006 through 22 April 2006
ER -