Abstract
We describe a suite of standards, resources and tools for computational encoding and processing of Modern Hebrew texts. These include an array of XML schemas for representing linguistic resources; a variety of text corpora, raw, automatically processed and manually annotated; lexical databases, including a broad-coverage monolingual lexicon, a bilingual dictionary and a WordNet; and morphological processors which can analyze, generate and disambiguate Hebrew word forms. The resources are developed under centralized supervision, so that they are compatible with each other. They are freely available and many of them have already been used for several applications, both academic and industrial.
Original language | English |
---|---|
Pages (from-to) | 75-98 |
Number of pages | 24 |
Journal | Language Resources and Evaluation |
Volume | 42 |
Issue number | 1 |
DOIs | |
State | Published - Mar 2008 |
Bibliographical note
Funding Information:Acknowledgments This work was funded by the Israeli Ministry of Science and Technology. Parts of this project were supported by THE ISRAEL SCIENCE FOUNDATION (grant No. 137/06); by the Israel Internet Association; and by the Caesarea Rothschild Institute for Interdisciplinary Application of Computer Science at the University of Haifa. Several people were involved in this work, and we are extremely grateful to all of them: Meni Adler, Roy Bar-Haim, Dalia Bojan, Ido Dagan, Michael Elhadad, Nomi Guthmann, Adi Milea, Noam Ordan, Erel Segal, Danny Shacham, Shira Schwartz, Yoad Winter, and Shlomo Yona. We are grateful to the reviewers for useful comments.
Keywords
- Corpora
- Hebrew
- Language resources
- Lexicon
- Morphological processing
- WordNet
ASJC Scopus subject areas
- Language and Linguistics
- Education
- Linguistics and Language
- Library and Information Sciences