Multi-way distributional clustering via pairwise interactions

Ron Bekkerman, Ran El-Yaniv, Andrew McCallum

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

We present a novel unsupervised learning scheme that simultaneously clusters variables of several types (e.g., documents, words and authors) based on pairwise interactions between the types, as observed in co-occurrence data. In this scheme, multiple clustering systems are generated aiming at maximizing an objective function that measures multiple pairwise mutual information between cluster variables. To implement this idea, we propose an algorithm that interleaves top-down clustering of some variables and bottom-up clustering of the other variables, with a local optimization correction routine. Focusing on document clustering we present an extensive empirical study of two-way, three-way and four-way applications of our scheme using six real-world datasets including the 20 News-groups (20NG) and the Enron email collection. Our multi-way distributional clustering (MDC) algorithms consistently and significantly outperform previous state-of-the-art information theoretic clustering algorithms.

Original languageEnglish
Title of host publicationICML 2005 - Proceedings of the 22nd International Conference on Machine Learning
EditorsL. Raedt, S. Wrobel
Pages41-48
Number of pages8
DOIs
StatePublished - 2005
Externally publishedYes
EventICML 2005: 22nd International Conference on Machine Learning - Bonn, Germany
Duration: 7 Aug 200511 Aug 2005

Publication series

NameICML 2005 - Proceedings of the 22nd International Conference on Machine Learning

Conference

ConferenceICML 2005: 22nd International Conference on Machine Learning
Country/TerritoryGermany
CityBonn
Period7/08/0511/08/05

ASJC Scopus subject areas

  • General Engineering

Fingerprint

Dive into the research topics of 'Multi-way distributional clustering via pairwise interactions'. Together they form a unique fingerprint.

Cite this