TY - GEN
T1 - Multi-way distributional clustering via pairwise interactions
AU - Bekkerman, Ron
AU - El-Yaniv, Ran
AU - McCallum, Andrew
PY - 2005
Y1 - 2005
N2 - We present a novel unsupervised learning scheme that simultaneously clusters variables of several types (e.g., documents, words and authors) based on pairwise interactions between the types, as observed in co-occurrence data. In this scheme, multiple clustering systems are generated aiming at maximizing an objective function that measures multiple pairwise mutual information between cluster variables. To implement this idea, we propose an algorithm that interleaves top-down clustering of some variables and bottom-up clustering of the other variables, with a local optimization correction routine. Focusing on document clustering we present an extensive empirical study of two-way, three-way and four-way applications of our scheme using six real-world datasets including the 20 News-groups (20NG) and the Enron email collection. Our multi-way distributional clustering (MDC) algorithms consistently and significantly outperform previous state-of-the-art information theoretic clustering algorithms.
AB - We present a novel unsupervised learning scheme that simultaneously clusters variables of several types (e.g., documents, words and authors) based on pairwise interactions between the types, as observed in co-occurrence data. In this scheme, multiple clustering systems are generated aiming at maximizing an objective function that measures multiple pairwise mutual information between cluster variables. To implement this idea, we propose an algorithm that interleaves top-down clustering of some variables and bottom-up clustering of the other variables, with a local optimization correction routine. Focusing on document clustering we present an extensive empirical study of two-way, three-way and four-way applications of our scheme using six real-world datasets including the 20 News-groups (20NG) and the Enron email collection. Our multi-way distributional clustering (MDC) algorithms consistently and significantly outperform previous state-of-the-art information theoretic clustering algorithms.
UR - http://www.scopus.com/inward/record.url?scp=31844439558&partnerID=8YFLogxK
U2 - 10.1145/1102351.1102357
DO - 10.1145/1102351.1102357
M3 - Conference contribution
AN - SCOPUS:31844439558
SN - 1595931805
T3 - ICML 2005 - Proceedings of the 22nd International Conference on Machine Learning
SP - 41
EP - 48
BT - ICML 2005 - Proceedings of the 22nd International Conference on Machine Learning
A2 - Raedt, L.
A2 - Wrobel, S.
T2 - ICML 2005: 22nd International Conference on Machine Learning
Y2 - 7 August 2005 through 11 August 2005
ER -