Adapting translation models to translationese improves SMT

Gennadi Lembersky, Noam Ordan, Shuly Wintner

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Translation models used for statistical machine translation are compiled from parallel corpora; such corpora are manually translated, but the direction of translation is usually unknown, and is consequently ignored. However, much research in Translation Studies indicates that the direction of translation matters, as translated language (translationese) has many unique properties. Specifically, phrase tables constructed from parallel corpora translated in the same direction as the translation task perform better than ones constructed from corpora translated in the opposite direction. We reconfirm that this is indeed the case, but emphasize the importance of using also texts translated in the 'wrong' direction. We take advantage of information pertaining to the direction of translation in constructing phrase tables, by adapting the translation model to the special properties of translationese. We define entropybased measures that estimate the correspondence of target-language phrases to translationese, thereby eliminating the need to annotate the parallel corpus with information pertaining to the direction of translation. We show that incorporating these measures as features in the phrase tables of statistical machine translation systems results in consistent, statistically significant improvement in the quality of the translation.

Original languageEnglish
Title of host publicationEACL 2012 - 13th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings
PublisherAssociation for Computational Linguistics (ACL)
Pages255-265
Number of pages11
ISBN (Electronic)9781937284190
StatePublished - 2012
Event13th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2012 - Avignon, France
Duration: 23 Apr 201227 Apr 2012

Publication series

NameEACL 2012 - 13th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings

Conference

Conference13th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2012
Country/TerritoryFrance
CityAvignon
Period23/04/1227/04/12

Bibliographical note

Publisher Copyright:
© 2012 Association for Computational Linguistics.

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Software

Fingerprint

Dive into the research topics of 'Adapting translation models to translationese improves SMT'. Together they form a unique fingerprint.

Cite this