TY - GEN
T1 - Contextual search and name disambiguation in email using graphs
AU - Minkov, Einat
AU - Cohen, William W.
AU - Ng, Andrew Y.
PY - 2006
Y1 - 2006
N2 - Similarity measures for text have historically been an important tool for solving information retrieval problems. In many interesting settings, however, documents are often closely connected to other documents, as well as other non-textual objects: for instance, email messages are connected to other messages via header information. In this paper we consider extended similarity metrics for documents and other objects embedded in graphs, facilitated via a lazy graph walk. We provide a detailed instantiation of this framework for email data, where content, social networks and a timeline are integrated in a structural graph. The suggested framework is evaluated for two email-related problems: disambiguating names in email documents, and threading. We show that reranking schemes based on the graph-walk similarity measures often outperform baseline methods, and that further improvements can be obtained by use of appropriate learning methods.
AB - Similarity measures for text have historically been an important tool for solving information retrieval problems. In many interesting settings, however, documents are often closely connected to other documents, as well as other non-textual objects: for instance, email messages are connected to other messages via header information. In this paper we consider extended similarity metrics for documents and other objects embedded in graphs, facilitated via a lazy graph walk. We provide a detailed instantiation of this framework for email data, where content, social networks and a timeline are integrated in a structural graph. The suggested framework is evaluated for two email-related problems: disambiguating names in email documents, and threading. We show that reranking schemes based on the graph-walk similarity measures often outperform baseline methods, and that further improvements can be obtained by use of appropriate learning methods.
KW - Email
KW - Graph-based retrieval
KW - Name disambiguation
KW - Threading
UR - http://www.scopus.com/inward/record.url?scp=33750347523&partnerID=8YFLogxK
U2 - 10.1145/1148170.1148179
DO - 10.1145/1148170.1148179
M3 - Conference contribution
AN - SCOPUS:33750347523
SN - 1595933697
SN - 9781595933690
T3 - Proceedings of the Twenty-Ninth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
SP - 27
EP - 34
BT - Proceedings of the Twenty-Ninth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
PB - Association for Computing Machinery
T2 - 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
Y2 - 6 August 2006 through 11 August 2006
ER -