We report on the creation of a medium-scale WordNet for Hebrew. We address this task as an instance of building a lexical resource for a new language (Hebrew) in a setting where similar resources exist for other languages, and multilingual requirements call for an align-ment of the new resource with the existing ones. We compare the two main paradigms, MultiWordNet and EuroWordNet, with an eye to other minority languages, who might lack, like Hebrew does, basic resources for carrying out such a task. As we show, the scales are tipped to the MultiWordNet paradigm for this very reason. Cast in this paradigm, the Hebrew WordNet is strictly aligned to the English lexicon. Consequently, the discrepancy between the languages has to be dealt with: on the one hand, the new resource has to be faithful to the linguistic data of the language for which it is created; on the other, it has to be aligned with existing resources for unrelated lan-guages. We distinguish between contingent and systematic cases of non-equivalence. For the former, we offer a corpus-based methodol-ogy that can be easily applied for any new language for which such a resource is planned. For the latter, we propose systematic solutions, focusing on the cases of gender, passive verbs, and antonyms. Where L2 is more specific in its semantic distinctions (as in the case of gen-der), we devise a solution which facilitates a full semantic inheri-tance. Where L2's distinctions are more general (as in passive verbs), our solution is partial and calls for further research. The case of an-tonyms is fully solved for most parts of speech, but it raises crucial questions regarding the typological bias of WordNet towards English (and other Indo-European languages), which may touch on both psy-cholinguistics and the feasibility of WordNet for such tasks as ma-chine translation.
|Journal||INTERNATIONAL JOURNAL OF TRANSLATION|
|State||Published - 1 Jan 2007|