Abstract
Translated texts (in any language) are so markedly different from original ones that text classification techniques can be used to tease them apart. Previous work has shown that awareness to these differences can significantly improve statistical machine translation. These results, however, required meta-information on the ontological status of texts (original or translated) which is typically unavailable. In this work we show that the predictions of translationese classifiers are as good as meta-information. First, when a monolingual corpus in the target language is given, to be used for constructing a language
model, predicting the translated portions of the corpus, and using only them for the language model, is as good as using the entire corpus. Second, identifying the portions of a parallel corpus that are translated in the direction of the translation task, and using only them for the translation model, is as good as using the entire corpus. We present results from several language pairs
and various data sets, indicating that these results are robust and general.
model, predicting the translated portions of the corpus, and using only them for the language model, is as good as using the entire corpus. Second, identifying the portions of a parallel corpus that are translated in the direction of the translation task, and using only them for the translation model, is as good as using the entire corpus. We present results from several language pairs
and various data sets, indicating that these results are robust and general.
Original language | English |
---|---|
Title of host publication | 10th Workshop on Statistical Machine Translation, WMT 2015 at the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015 - Proceedings |
Editors | Ondrej Bojar, Rajan Chatterjee, Christian Federmann, Barry Haddow, Chris Hokamp, Matthias Huck, Varvara Logacheva, Pavel Pecina |
Place of Publication | Lisbon, Portugal |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 47-57 |
Number of pages | 11 |
ISBN (Electronic) | 9781941643327 |
DOIs | |
State | Published - 2015 |
Event | 10th Workshop on Statistical Machine Translation, WMT 2015 at the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015 - Lisbon, Portugal Duration: 17 Sep 2015 → 18 Sep 2015 |
Publication series
Name | 10th Workshop on Statistical Machine Translation, WMT 2015 at the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015 - Proceedings |
---|
Conference
Conference | 10th Workshop on Statistical Machine Translation, WMT 2015 at the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015 |
---|---|
Country/Territory | Portugal |
City | Lisbon |
Period | 17/09/15 → 18/09/15 |
Bibliographical note
Funding Information:This research was supported by a grant from the Israeli Ministry of Science and Technology. The second author was supported by Cluster of Excellence MMCI at Saarland University. We are grateful to Gennadi Lembersky for his continuous help.
Publisher Copyright:
© EMNLP 2015. All rights reserved.
ASJC Scopus subject areas
- Information Systems
- Computational Theory and Mathematics
- Computer Science Applications