Language models for machine translation: Original vs. translated texts

Gennadi Lembersky, Noam Ordan, Shuly Wintner

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

We investigate the differences between language models compiled from original target-language texts and those compiled from texts manually translated to the target language. Corroborating established observations of Translation Studies, we demonstrate that the latter are significantly better predictors of translated sentences than the former, and hence fit the reference set better. Furthermore, translated texts yield better language models for statistical machine translation than original texts.

Original languageEnglish
Title of host publicationEMNLP 2011 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference
Pages363-374
Number of pages12
StatePublished - 2011
EventConference on Empirical Methods in Natural Language Processing, EMNLP 2011 - Edinburgh, United Kingdom
Duration: 27 Jul 201131 Jul 2011

Publication series

NameEMNLP 2011 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference

Conference

ConferenceConference on Empirical Methods in Natural Language Processing, EMNLP 2011
Country/TerritoryUnited Kingdom
CityEdinburgh
Period27/07/1131/07/11

Bibliographical note

Funding Information:
The first author would like to thank Esther and Kiri for their loving support.

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Information Systems

Fingerprint

Dive into the research topics of 'Language models for machine translation: Original vs. translated texts'. Together they form a unique fingerprint.

Cite this