Abstract
Typed lexicons that encode knowledge about the semantic types of an entity
name, e.g., that ‘Paris’ denotes a geolocation, product, or person, have proven useful for many text processing tasks. While lexicons may be derived from large-scale knowledge bases (KBs), KBs are inherently imperfect, in particular they lack
coverage with respect to long tail entity names. We infer the types of a given
entity name using multi-source learning, considering information obtained by
alignment to the Freebase knowledge base, Web-scale distributional patterns,
and global semi-structured contexts retrieved by means of Web search. Evaluation in the challenging domain of social media shows that multi-source learning improves performance compared with rule-based KB lookups, boosting typing results for some semantic categories.
name, e.g., that ‘Paris’ denotes a geolocation, product, or person, have proven useful for many text processing tasks. While lexicons may be derived from large-scale knowledge bases (KBs), KBs are inherently imperfect, in particular they lack
coverage with respect to long tail entity names. We infer the types of a given
entity name using multi-source learning, considering information obtained by
alignment to the Freebase knowledge base, Web-scale distributional patterns,
and global semi-structured contexts retrieved by means of Web search. Evaluation in the challenging domain of social media shows that multi-source learning improves performance compared with rule-based KB lookups, boosting typing results for some semantic categories.
Original language | English |
---|---|
Title of host publication | Proceedings of the 6st Named Entities Workshop collocated with ACL'16 (NEWS-ACL) |
Place of Publication | Berlin, Germany |
Publisher | Association for Computational Linguistics |
Pages | 11-20 |
Number of pages | 10 |
DOIs | |
State | Published - 1 Aug 2016 |