We present a verb-complement dictionary of Modern Hebrew, automatically extracted from text corpora. Carefully examining a large set of examples, we defined ten types of verb complements that cover the vast majority of the occurrences of verb complements in the corpora. We explored several collocation measures as indicators of the strength of the association between the verb and its complement. We then used these measures to automatically extract verb complements from corpora. The result is a wide-coverage, accurate dictionary that lists not only the likely complements for each verb, but also the likelihood of each complement. We evaluated the quality of the extracted dictionary both intrinsically and extrinsically. Intrinsically, we showed high precision and recall on randomly (but systematically) selected verbs. Extrinsically, we showed that using the extracted information is beneficial for two applications, prepositional phrase attachment disambiguation and Arabic-to-Hebrew machine translation.
Bibliographical noteFunding Information:
Acknowledgments This research was supported by THE ISRAEL SCIENCE FOUNDATION (Grants No. 1269/07, 505/11). We are grateful to Reshef Shilon for his help with the machine translation experiments, and to Yoav Goldberg for his help with the Hebrew parser. Thanks are also due to Kayla Jacobs for several useful comments. We benefitted greatly from the constructive comments of three anonymous reviewers.
- Verb subcategorization
ASJC Scopus subject areas
- Language and Linguistics
- Linguistics and Language
- Library and Information Sciences