Abstract
We describe a set of bilingual English-French and English-German parallel corpora in which the direction of translation is accurately and reliably annotated. The corpora are diverse, consisting of parliamentary proceedings, literary works, transcriptions of TED talks and political commentary. They will be instrumental for research of translationese and its applications to (human and machine) translation; specifically, they can be used for the task of translationese identification, a research direction that enjoys a growing interest in recent years. To validate the quality and reliability of the corpora, we replicated previous results of supervised and unsupervised identification of translationese, and further extended the experiments to additional datasets and languages.
Original language | English |
---|---|
Title of host publication | Computational Linguistics and Intelligent Text Processing - 17th International Conference, CICLing 2016, Revised Selected Papers |
Editors | Alexander Gelbukh |
Publisher | Springer Verlag |
Pages | 140-155 |
Number of pages | 16 |
ISBN (Print) | 9783319754864 |
DOIs | |
State | Published - 2018 |
Event | 17th International Conference on Intelligent Text Processing and Computational Linguistics, CICLing 2016 - Konya, Turkey Duration: 3 Apr 2016 → 9 Apr 2016 |
Publication series
Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Volume | 9624 LNCS |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Conference
Conference | 17th International Conference on Intelligent Text Processing and Computational Linguistics, CICLing 2016 |
---|---|
Country/Territory | Turkey |
City | Konya |
Period | 3/04/16 → 9/04/16 |
Bibliographical note
Publisher Copyright:© Springer International Publishing AG, part of Springer Nature 2018.
Keywords
- Machine translation
- Parallel corpora
- Translationese
ASJC Scopus subject areas
- Theoretical Computer Science
- General Computer Science