Identifying translationese at the word and sub-word level

Ehud Alexander Avner, Noam Ordan, Shuly Wintner

Research output: Contribution to journalArticlepeer-review

Abstract

We use text classification to distinguish automatically between original and translated texts in Hebrew, a morphologically complex language. To this end, we design several linguistically informed feature sets that capture word-level and sub-word-level (in particular, morphological) properties of Hebrew. Such features are abstract enough to allow for the development of accurate, robust classifiers, and they also lend themselves to linguistic interpretation. Careful evaluation shows that some of the classifiers we define are, indeed, highly accurate, and scale up nicely to domains that they were not trained on. In addition, analysis of the best features provides insight into the morphological properties of translated texts.

Original languageEnglish
Pages (from-to)30-54
Number of pages25
JournalDigital Scholarship in the Humanities
Volume31
Issue number1
DOIs
StatePublished - 1 Apr 2016

Bibliographical note

Publisher Copyright:
© The Author 2014. Published by Oxford University Press on behalf of EADH.

ASJC Scopus subject areas

  • Information Systems
  • Language and Linguistics
  • Linguistics and Language
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Identifying translationese at the word and sub-word level'. Together they form a unique fingerprint.

Cite this