TY - GEN
T1 - Tagging a Hebrew corpus
T2 - 6th International Conference on Language Resources and Evaluation, LREC 2008
AU - Adler, Meni
AU - Netzer, Yael
AU - Goldberg, Yoav
AU - Gabay, David
AU - Elhadad, Michael
PY - 2008
Y1 - 2008
N2 - We report on an effort to build a corpus of Modern Hebrew tagged with parts of speech and morphology. We designed a tagset specific to Hebrew while focusing on four aspects: the tagset should be consistent with common linguistic knowledge; there should be maximal agreement among taggers as to the tags assigned to maintain consistency; the tagset should be useful for machine taggers and learning algorithms; and the tagset should be effective for applications relying on the tags as input features. In this paper, we illustrate these issues by explaining our decision to introduce a tag for beinoni forms in Hebrew. We explain how this tag is defined, and how it helped us improve manual tagging accuracy to a high-level, while improving automatic tagging and helping in the task of syntactic chunking.
AB - We report on an effort to build a corpus of Modern Hebrew tagged with parts of speech and morphology. We designed a tagset specific to Hebrew while focusing on four aspects: the tagset should be consistent with common linguistic knowledge; there should be maximal agreement among taggers as to the tags assigned to maintain consistency; the tagset should be useful for machine taggers and learning algorithms; and the tagset should be effective for applications relying on the tags as input features. In this paper, we illustrate these issues by explaining our decision to introduce a tag for beinoni forms in Hebrew. We explain how this tag is defined, and how it helped us improve manual tagging accuracy to a high-level, while improving automatic tagging and helping in the task of syntactic chunking.
UR - http://www.scopus.com/inward/record.url?scp=84966913381&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84966913381
T3 - Proceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008
SP - 3167
EP - 3174
BT - Proceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008
PB - European Language Resources Association (ELRA)
Y2 - 28 May 2008 through 30 May 2008
ER -