TY - GEN
T1 - Mining for misconfigured machines in grid systems
AU - Palatin, Noam
AU - Leizarowitz, Arie
AU - Schuster, Assaf
AU - Wolff, Ran
PY - 2006
Y1 - 2006
N2 - Grid systems are proving increasingly useful for managing the batch computing jobs of organizations. One well-known example is Intel, whose internally developed NetBatch system manages tens of thousands of machines. The size, heterogeneity, and complexity of grid systems make them very difficult, however, to configure. This often results in misconfigured machines, which may adversely affect the entire system. We investigate a distributed data mining approach for detection of misconfigured machines. Our Grid Monitoring System (GMS) non-intrusively collects data from all sources (log files, system services, etc.) available throughout the grid system. It converts raw data to semantically meaningful data and stores this data on the machine it was obtained from, limiting incurred overhead and allowing scalability. Afterwards, when analysis is requested, a distributed outliers detection algorithm is employed to identify misconfigured machines. The algorithm itself is implemented as a recursive workflow of grid jobs. It is especially suited to grid systems, in which the machines might be unavailable most of the time and often fail altogether.
AB - Grid systems are proving increasingly useful for managing the batch computing jobs of organizations. One well-known example is Intel, whose internally developed NetBatch system manages tens of thousands of machines. The size, heterogeneity, and complexity of grid systems make them very difficult, however, to configure. This often results in misconfigured machines, which may adversely affect the entire system. We investigate a distributed data mining approach for detection of misconfigured machines. Our Grid Monitoring System (GMS) non-intrusively collects data from all sources (log files, system services, etc.) available throughout the grid system. It converts raw data to semantically meaningful data and stores this data on the machine it was obtained from, limiting incurred overhead and allowing scalability. Afterwards, when analysis is requested, a distributed outliers detection algorithm is employed to identify misconfigured machines. The algorithm itself is implemented as a recursive workflow of grid jobs. It is especially suited to grid systems, in which the machines might be unavailable most of the time and often fail altogether.
KW - Distributed Data Mining
KW - Grid Information System
KW - Grid Systems
KW - Outliers Detection
KW - System Monitoring
UR - http://www.scopus.com/inward/record.url?scp=33749559435&partnerID=8YFLogxK
U2 - 10.1145/1150402.1150488
DO - 10.1145/1150402.1150488
M3 - Conference contribution
AN - SCOPUS:33749559435
SN - 1595933395
SN - 9781595933393
T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
SP - 687
EP - 692
BT - KDD 2006
PB - Association for Computing Machinery (ACM)
T2 - KDD 2006: 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Y2 - 20 August 2006 through 23 August 2006
ER -