Web page clustering using heuristic search in the web graph

Ron Bekkerman, Shlomo Zilberstein, James Allan

Research output: Contribution to journalConference articlepeer-review

Abstract

Effective representation of Web search results remains an open problem in the Information Retrieval community. For ambiguous queries, a traditional approach is to organize search results into groups (clusters), one for each meaning of the query. These groups are usually constructed according to the topical similarity of the retrieved documents, but it is possible for documents to be totally dissimilar and still correspond to the same meaning of the query. To overcome this problem, we exploit the thematic locality of the Web- relevant Web pages are often located close to each other in the Web graph of hyperlinks. We estimate the level of relevance between each pair of retrieved pages by the length of a path between them. The path is constructed using multi-agent beam search: each agent starts with one Web page and attempts to meet as many other agents as possible with some bounded resources. We test the system on two types of queries: ambiguous English words and people names. The Web appears to be tightly connected; about 70% of the agents meet with each other after only three iterations of exhaustive breadth-first search. However, when heuristics are applied, the search becomes more focused and the obtained results are substantially more accurate. Combined with a content-driven Web page clustering technique, our heuristic search system significantly improves the clustering results.

Original languageEnglish
Pages (from-to)2280-2285
Number of pages6
JournalIJCAI International Joint Conference on Artificial Intelligence
StatePublished - 2007
Externally publishedYes
Event20th International Joint Conference on Artificial Intelligence, IJCAI 2007 - Hyderabad, India
Duration: 6 Jan 200712 Jan 2007

ASJC Scopus subject areas

  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Web page clustering using heuristic search in the web graph'. Together they form a unique fingerprint.

Cite this