Abstract
Learner corpora—datasets that reflect the language of non-native speakers—are instrumental for research of language learning and development, as well as for practical applications, mainly for teaching and education. Such corpora now exist for a plethora of native–foreign language pairs; but until recently, none of them reflected native Hebrew speakers, and very few reflected native Arabic speakers. We introduce a recently-released corpus of English essays authored by learners in Israel. The corpus consists of two sub-corpora, one of them of Arabic native speakers and the other consisting mainly of Hebrew native speakers. We report on the composition and curation of the datasets; specifically, we processed the data so that both sub-corpora are now uniformly represented, facilitating seamless research and computational processing of the data. We provide statistical information on the corpora and outline a few research projects that had already used them. This is the first and only learner corpus in Israel including two major native languages of people in the same educational system regarding the English syllabus. All the resources related to the corpus are freely available.
Original language | English |
---|---|
Journal | Language Resources and Evaluation |
DOIs | |
State | Published - 2023 |
Bibliographical note
Publisher Copyright:© 2023, The Author(s), under exclusive licence to Springer Nature B.V.
Keywords
- Arabic
- Corpus linguistics
- ESL
- Hebrew
- Learner corpora
ASJC Scopus subject areas
- Language and Linguistics
- Education
- Linguistics and Language
- Library and Information Sciences