A corpus of native, non-native and translated texts

Sergiu Nisioi, Ella Rabinovich, Liviu P. Dinu, Shuly Wintner

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

We describe a monolingual English corpus of original and (human) translated texts, with an accurate annotation of speaker properties, including the original language of the utterances and the speaker's country of origin. We thus obtain three sub-corpora of texts reflecting native English, non-native English, and English translated from a variety of European languages. This dataset will facilitate the investigation of similarities and differences between these kinds of sub-languages. Moreover, it will facilitate a unified comparative study of translations and language produced by (highly fluent) non-native speakers, two closely-related phenomena that have only been studied in isolation so far.

Original languageEnglish
Title of host publicationProceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016
EditorsNicoletta Calzolari, Khalid Choukri, Helene Mazo, Asuncion Moreno, Thierry Declerck, Sara Goggi, Marko Grobelnik, Jan Odijk, Stelios Piperidis, Bente Maegaard, Joseph Mariani
PublisherEuropean Language Resources Association (ELRA)
Pages4197-4201
Number of pages5
ISBN (Electronic)9782951740891
StatePublished - 2016
Event10th International Conference on Language Resources and Evaluation, LREC 2016 - Portoroz, Slovenia
Duration: 23 May 201628 May 2016

Publication series

NameProceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016

Conference

Conference10th International Conference on Language Resources and Evaluation, LREC 2016
Country/TerritorySlovenia
CityPortoroz
Period23/05/1628/05/16

Bibliographical note

Funding Information:
This research was supported by the Israeli Ministry of Science and Technology and partly by UEFISCDI, PNII-IDPCE-2011-3-0959. We are grateful to Noam Ordan for his advices and helpful suggestions.

Keywords

  • Bilingualism
  • Corpus linguistics
  • Second language acquisition
  • Translation

ASJC Scopus subject areas

  • Linguistics and Language
  • Library and Information Sciences
  • Language and Linguistics
  • Education

Fingerprint

Dive into the research topics of 'A corpus of native, non-native and translated texts'. Together they form a unique fingerprint.

Cite this