Abstract
We describe a text categorization approach that is based on a combination of feature distributional clusters with a support vector machine (SVM) classifier. Our feature selection approach employs distributional clustering of words via the recently introduced information bottleneck method, which generates a more efficient word-cluster representation of documents. Combined with the classification power of an SVM, this method yields high performance text categorization that can outperform other recent methods in terms of categorization accuracy and representation efficiency. Comparing the accuracy of our method with other techniques, we observe significant dependency of the results on the data set. We discuss the potential reasons for this dependency.
Original language | English |
---|---|
Pages (from-to) | 146-153 |
Number of pages | 8 |
Journal | SIGIR Forum (ACM Special Interest Group on Information Retrieval) |
DOIs | |
State | Published - 2001 |
Externally published | Yes |
Event | 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval - New Orleans, LA, United States Duration: 9 Sep 2001 → 13 Sep 2001 |
Bibliographical note
Funding Information:The work was supported by the strategic grant POSDRU/159/1.5/S/137750, Project ?Postdoctoral programme for training scientific researchers? co-financed by the European Social Foundation within the Sectorial Operational Program Human Resources Development 2007?2013. Prof. Simona M. Coman kindly acknowledges UEFISCDI for the financial support (project PN-II-PCCA-2011-3.2-1367, Nr. 31/2012).
ASJC Scopus subject areas
- Management Information Systems
- Hardware and Architecture