A parallel corpus of translationese

Ella Rabinovich, Shuly Wintner, Ofek Luis Lewinsohn

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

We describe a set of bilingual English-French and English-German parallel corpora in which the direction of translation is accurately and reliably annotated. The corpora are diverse, consisting of parliamentary proceedings, literary works, transcriptions of TED talks and political commentary. They will be instrumental for research of translationese and its applications to (human and machine) translation; specifically, they can be used for the task of translationese identification, a research direction that enjoys a growing interest in recent years. To validate the quality and reliability of the corpora, we replicated previous results of supervised and unsupervised identification of translationese, and further extended the experiments to additional datasets and languages.

Original languageEnglish
Title of host publicationComputational Linguistics and Intelligent Text Processing - 17th International Conference, CICLing 2016, Revised Selected Papers
EditorsAlexander Gelbukh
PublisherSpringer Verlag
Pages140-155
Number of pages16
ISBN (Print)9783319754864
DOIs
StatePublished - 2018
Event17th International Conference on Intelligent Text Processing and Computational Linguistics, CICLing 2016 - Konya, Turkey
Duration: 3 Apr 20169 Apr 2016

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume9624 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference17th International Conference on Intelligent Text Processing and Computational Linguistics, CICLing 2016
Country/TerritoryTurkey
CityKonya
Period3/04/169/04/16

Bibliographical note

Funding Information:
This research was supported by a grant from the Israeli Ministry of Science and Technology. We are grateful to Noam Ordan for much advice and encouragement. We also thank Sergiu Nisioi for helpful suggestions. We are grateful to Philipp Koehn for making the Europarl corpus available; to Cyril Goutte, George Foster and Pierre Isabelle for providing us with an annotated version of the Hansard corpus; to François Yvon and András Farkas (http://farkastranslations.com) for contributing their literary corpora; and to the TED OTP team for sharing TED talks and their translations. We thank also Raphael Salkie for sharing his diverse English-German corpus.

Funding Information:
Acknowledgments. This research was supported by a grant from the Israeli Ministry of Science and Technology. We are grateful to Noam Ordan for much advice and encouragement. We also thank Sergiu Nisioi for helpful suggestions. We are grateful to Philipp Koehn for making the Europarl corpus available; to Cyril Goutte, George Foster and Pierre Isabelle for providing us with an annotated version of the Hansard corpus; to Fran¸cois Yvon and András Farkas (http://farkastranslations.com) for contributing their literary corpora; and to the TED OTP team for sharing TED talks and their translations. We thank also Raphael Salkie for sharing his diverse English-German corpus.

Publisher Copyright:
© Springer International Publishing AG, part of Springer Nature 2018.

Keywords

  • Machine translation
  • Parallel corpora
  • Translationese

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science (all)

Fingerprint

Dive into the research topics of 'A parallel corpus of translationese'. Together they form a unique fingerprint.

Cite this