A corpus of English learners with Arabic and Hebrew backgrounds

Omaima Abboud, Batia Laufer, Noam Ordan, Uliana Sentsova, Shuly Wintner

Research output: Contribution to journalArticlepeer-review


Learner corpora—datasets that reflect the language of non-native speakers—are instrumental for research of language learning and development, as well as for practical applications, mainly for teaching and education. Such corpora now exist for a plethora of native–foreign language pairs; but until recently, none of them reflected native Hebrew speakers, and very few reflected native Arabic speakers. We introduce a recently-released corpus of English essays authored by learners in Israel. The corpus consists of two sub-corpora, one of them of Arabic native speakers and the other consisting mainly of Hebrew native speakers. We report on the composition and curation of the datasets; specifically, we processed the data so that both sub-corpora are now uniformly represented, facilitating seamless research and computational processing of the data. We provide statistical information on the corpora and outline a few research projects that had already used them. This is the first and only learner corpus in Israel including two major native languages of people in the same educational system regarding the English syllabus. All the resources related to the corpus are freely available.

Original languageEnglish
JournalLanguage Resources and Evaluation
StatePublished - 2023

Bibliographical note

Publisher Copyright:
© 2023, The Author(s), under exclusive licence to Springer Nature B.V.


  • Arabic
  • Corpus linguistics
  • ESL
  • Hebrew
  • Learner corpora

ASJC Scopus subject areas

  • Language and Linguistics
  • Education
  • Linguistics and Language
  • Library and Information Sciences


Dive into the research topics of 'A corpus of English learners with Arabic and Hebrew backgrounds'. Together they form a unique fingerprint.

Cite this