Machine Translation into Low-resource Language Varieties

Sachin Kumar, Antonios Anastasopoulos, Shuly Wintner, Yulia Tsvetkov

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

State-of-the-art machine translation (MT) systems are typically trained to generate "standard"target language; however, many languages have multiple varieties (regional varieties, dialects, sociolects, non-native varieties) that are different from the standard language. Such varieties are often low-resource, and hence do not benefit from contemporary NLP solutions, MT included. We propose a general framework to rapidly adapt MT systems to generate language varieties that are close to, but different from, the standard target language, using no parallel (source- variety) data. This also includes adaptation of MT systems to low-resource typologicallyrelated target languages.1 We experiment with adapting an English-Russian MT system to generate Ukrainian and Belarusian, an English-Norwegian Bokmål system to generate Nynorsk, and an English-Arabic system to generate four Arabic dialects, obtaining significant improvements over competitive baselines.

Original languageEnglish
Title of host publicationACL-IJCNLP 2021 - 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Proceedings of the Conference
PublisherAssociation for Computational Linguistics (ACL)
Pages110-121
Number of pages12
ISBN (Electronic)9781954085527
StatePublished - 2021
EventJoint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL-IJCNLP 2021 - Virtual, Online
Duration: 1 Aug 20216 Aug 2021

Publication series

NameACL-IJCNLP 2021 - 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Proceedings of the Conference
Volume2

Conference

ConferenceJoint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL-IJCNLP 2021
CityVirtual, Online
Period1/08/216/08/21

Bibliographical note

Publisher Copyright:
© 2021 Association for Computational Linguistics.

ASJC Scopus subject areas

  • Software
  • Computational Theory and Mathematics
  • Linguistics and Language
  • Language and Linguistics

Fingerprint

Dive into the research topics of 'Machine Translation into Low-resource Language Varieties'. Together they form a unique fingerprint.

Cite this