Identifying multi-word expressions by leveraging morphological and syntactic idiosyncrasy

Hassan Al-Haj, Shuly Wintner

Research output: Contribution to conferencePaperpeer-review

Abstract

Multi-word expressions constitute a significant portion of the lexicon of every natural language, and handling them correctly is mandatory for various NLP applications. Yet such entities are notoriously hard to define, and are consequently missing from standard lexicons and dictionaries. Multi-word expressions exhibit idiosyncratic behavior on various levels: orthographic, morphological, syntactic and semantic. In this work we take advantage of the morphological and syntactic idiosyncrasy of Hebrew noun compounds and employ it to extract such expressions from text corpora. We show that relying on linguistic information dramatically improves the accuracy of compound extraction, reducing over one third of the errors compared with the best baseline.

Original languageEnglish
Pages10-18
Number of pages9
StatePublished - 2010
Event23rd International Conference on Computational Linguistics, Coling 2010 - Beijing, China
Duration: 23 Aug 201027 Aug 2010

Conference

Conference23rd International Conference on Computational Linguistics, Coling 2010
Country/TerritoryChina
CityBeijing
Period23/08/1027/08/10

ASJC Scopus subject areas

  • Language and Linguistics
  • Computational Theory and Mathematics
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Identifying multi-word expressions by leveraging morphological and syntactic idiosyncrasy'. Together they form a unique fingerprint.

Cite this