Abstract
We present the Hebrew Essay Corpus: an annotated corpus of Hebrew language argumentative essays authored by prospective higher-education students. The corpus includes both essays by native speakers, written as part of the psychometric exam that is used to assess their future success in academic studies; and essays authored by non-native speakers, with three different native languages, that were written as part of a language aptitude test. The corpus is uniformly encoded and stored. The nonnative essays were annotated with target hypotheses whose main goal is to make the texts amenable to automatic processing (morphological and syntactic analysis). The corpus is available for academic purposes upon request. We describe the corpus and the error correction and annotation schemes used in its analysis. In addition to introducing this new resource, we discuss the challenges of identifying and analyzing non-native language use in general, and propose various ways for dealing with these challenges.
Original language | English |
---|---|
Title of host publication | 2022 Language Resources and Evaluation Conference, LREC 2022 |
Editors | Nicoletta Calzolari, Frederic Bechet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Helene Mazo, Jan Odijk, Stelios Piperidis |
Publisher | European Language Resources Association (ELRA) |
Pages | 5580-5586 |
Number of pages | 7 |
ISBN (Electronic) | 9791095546726 |
State | Published - 2022 |
Event | 13th International Conference on Language Resources and Evaluation Conference, LREC 2022 - Marseille, France Duration: 20 Jun 2022 → 25 Jun 2022 |
Publication series
Name | 2022 Language Resources and Evaluation Conference, LREC 2022 |
---|
Conference
Conference | 13th International Conference on Language Resources and Evaluation Conference, LREC 2022 |
---|---|
Country/Territory | France |
City | Marseille |
Period | 20/06/22 → 25/06/22 |
Bibliographical note
Publisher Copyright:© European Language Resources Association (ELRA), licensed under CC-BY-NC-4.0.
Keywords
- Hebrew
- Learner corpora
- non-native language
ASJC Scopus subject areas
- Language and Linguistics
- Library and Information Sciences
- Linguistics and Language
- Education