Learning Hebrew roots: Machine learning with linguistic constraints

Ezra Daya, Dan Roth, Shuly Wintner

Research output: Contribution to conferencePaperpeer-review

Abstract

The morphology of Semitic languages is unique in the sense that the major word-formation mechanism is an inherently non-concatenative process of interdigitation, whereby two morphemes, a root and a pattern, are interwoven. Identifying the root of a given word in a Semitic language is an important task, in some cases a crucial part of morphological analysis. It is also a non-trivial task, which many humans find challenging. We present a machine learning approach to the problem of extracting roots of Hebrew words. Given the large number of potential roots (thousands), we address the problem as one of combining several classifiers, each predicting the value of one of the root's consonants. We show that when these predictors are combined by enforcing some fairly simple linguistics constraints, high accuracy, which compares favorably with human performance on this task, can be achieved.

Original languageEnglish
Pages357-364
Number of pages8
StatePublished - 2004
Event2004 Conference on Empirical Methods in Natural Language Processing, EMNLP 2004 - Barcelona, Spain
Duration: 25 Jul 200426 Jul 2004

Conference

Conference2004 Conference on Empirical Methods in Natural Language Processing, EMNLP 2004
Country/TerritorySpain
CityBarcelona
Period25/07/0426/07/04

Bibliographical note

Funding Information:
This work was supported by The Caesarea Edmond Benjamin de Rothschild Foundation Institute for Interdisciplinary Applications of Computer Science. Dan Roth is supported by NSF grants CAREER IIS-9984168, ITR IIS-0085836, and ITR-IIS 00-85980. We thank Meira Hess and Liron Ashkenazi for annotating the corpus and Alon Lavie and Ido Dagan for useful comments.

Publisher Copyright:
© 2005 Association for Computational Linguistics

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Information Systems

Fingerprint

Dive into the research topics of 'Learning Hebrew roots: Machine learning with linguistic constraints'. Together they form a unique fingerprint.

Cite this