On feature distributional clustering for text categorization

Ron Bekkerman, Ran El-Yaniv, Yoad Winter, Naftali Tishby

Research output: Contribution to journalConference articlepeer-review

Abstract

We describe a text categorization approach that is based on a combination of feature distributional clusters with a support vector machine (SVM) classifier. Our feature selection approach employs distributional clustering of words via the recently introduced information bottleneck method, which generates a more efficient word-cluster representation of documents. Combined with the classification power of an SVM, this method yields high performance text categorization that can outperform other recent methods in terms of categorization accuracy and representation efficiency. Comparing the accuracy of our method with other techniques, we observe significant dependency of the results on the data set. We discuss the potential reasons for this dependency.

Original languageEnglish
Pages (from-to)146-153
Number of pages8
JournalSIGIR Forum (ACM Special Interest Group on Information Retrieval)
DOIs
StatePublished - 2001
Externally publishedYes
Event24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval - New Orleans, LA, United States
Duration: 9 Sep 200113 Sep 2001

Bibliographical note

Funding Information:
The work was supported by the strategic grant POSDRU/159/1.5/S/137750, Project ?Postdoctoral programme for training scientific researchers? co-financed by the European Social Foundation within the Sectorial Operational Program Human Resources Development 2007?2013. Prof. Simona M. Coman kindly acknowledges UEFISCDI for the financial support (project PN-II-PCCA-2011-3.2-1367, Nr. 31/2012).

ASJC Scopus subject areas

  • Management Information Systems
  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'On feature distributional clustering for text categorization'. Together they form a unique fingerprint.

Cite this