Abstract
We consider a parsed text corpus as an instance of a labelled directed graph, where nodes represent words and weighted directed edges represent the syntactic relations between them. We show that graph walks, combined with existing techniques of supervised learning, can be used to derive a task-specific word similarity measure in this graph. We also propose a new path-constrained graph walk method, in which the graph walk process is guided by high-level knowledge about meaningful edge sequences (paths). Empirical evaluation on the task of named entity coordinate term extraction shows that this framework is preferable to vector-based models for small-sized corpora. It is also shown that the path-constrained graph walk algorithm yields both performance and scalability gains.
Original language | English |
---|---|
Pages | 907-916 |
Number of pages | 10 |
DOIs | |
State | Published - 2008 |
Externally published | Yes |
Event | 2008 Conference on Empirical Methods in Natural Language Processing, EMNLP 2008, Co-located with AMTA 2008 and the International Workshop on Spoken Language Translation - Honolulu, HI, United States Duration: 25 Oct 2008 → 27 Oct 2008 |
Conference
Conference | 2008 Conference on Empirical Methods in Natural Language Processing, EMNLP 2008, Co-located with AMTA 2008 and the International Workshop on Spoken Language Translation |
---|---|
Country/Territory | United States |
City | Honolulu, HI |
Period | 25/10/08 → 27/10/08 |
ASJC Scopus subject areas
- Information Systems
- Computational Theory and Mathematics
- Computer Science Applications