Abstract
State-of-the-art machine translation (MT) systems are typically trained to generate "standard"target language; however, many languages have multiple varieties (regional varieties, dialects, sociolects, non-native varieties) that are different from the standard language. Such varieties are often low-resource, and hence do not benefit from contemporary NLP solutions, MT included. We propose a general framework to rapidly adapt MT systems to generate language varieties that are close to, but different from, the standard target language, using no parallel (source- variety) data. This also includes adaptation of MT systems to low-resource typologicallyrelated target languages.1 We experiment with adapting an English-Russian MT system to generate Ukrainian and Belarusian, an English-Norwegian Bokmål system to generate Nynorsk, and an English-Arabic system to generate four Arabic dialects, obtaining significant improvements over competitive baselines.
Original language | English |
---|---|
Title of host publication | ACL-IJCNLP 2021 - 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Proceedings of the Conference |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 110-121 |
Number of pages | 12 |
ISBN (Electronic) | 9781954085527 |
State | Published - 2021 |
Event | Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL-IJCNLP 2021 - Virtual, Online Duration: 1 Aug 2021 → 6 Aug 2021 |
Publication series
Name | ACL-IJCNLP 2021 - 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Proceedings of the Conference |
---|---|
Volume | 2 |
Conference
Conference | Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL-IJCNLP 2021 |
---|---|
City | Virtual, Online |
Period | 1/08/21 → 6/08/21 |
Bibliographical note
Publisher Copyright:© 2021 Association for Computational Linguistics.
ASJC Scopus subject areas
- Software
- Computational Theory and Mathematics
- Linguistics and Language
- Language and Linguistics