While there has been much research on automatically constructing structured Knowledge Bases (KBs), most of it has focused on generating facts to populate a KB. However, a useful KB must go beyond facts. For example, glosses (short natural language definitions) have been found to be very useful in tasks such as Word Sense Disambiguation. However, the important problem of Automatic Gloss Finding, i.e., assigning glosses to entities in an initially gloss-free KB, is relatively unexplored. We address that gap in this paper. In particular, we propose GLOFIN, a hierarchical semi-supervised learning algorithm for this problem which makes effective use of limited amounts of supervision and available ontological constraints. To the best of our knowledge, GLOFIN is the first system for this task. Through extensive experiments on real-world datasets, we demonstrate GLOFIN's effectiveness. It is encouraging to see that GLOFIN outperforms other state-of-the-art SSL algorithms, especially in low supervision settings. We also demonstrate GLOFIN's robustness to noise through experiments on a wide variety of KBs, ranging from user contributed (e.g., Freebase) to automatically constructed (e.g., NELL). To facilitate further research in this area, we have made the datasets and code used in this paper publicly available.
|Title of host publication||WSDM 2015 - Proceedings of the 8th ACM International Conference on Web Search and Data Mining|
|Publisher||Association for Computing Machinery|
|Number of pages||10|
|State||Published - 2 Feb 2015|
|Event||8th ACM International Conference on Web Search and Data Mining, WSDM 2015 - Shanghai, China|
Duration: 31 Jan 2015 → 6 Feb 2015
|Name||WSDM 2015 - Proceedings of the 8th ACM International Conference on Web Search and Data Mining|
|Conference||8th ACM International Conference on Web Search and Data Mining, WSDM 2015|
|Period||31/01/15 → 6/02/15|
Bibliographical noteFunding Information:
This work was supported by the SmartState Program (Translational Biomedical Informatics Chair Endowment), SC Research Centers for Economic Excellence. (SMM).
Copyright © 2015 ACM.
- Gloss finding
- Hierarchical learning
- Web mining.
ASJC Scopus subject areas
- Computer Networks and Communications