Lightly supervised transliteration for machine translation

Amit Kirschenbaum, Shuly Wintner

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

We present a Hebrew to English transliteration method in the context of a machine translation system. Our method uses machine learning to determine which terms are to be transliterated rather than translated. The training corpus for this purpose includes only positive examples, acquired semi-automatically. Our classifier reduces more than 38% of the errors made by a baseline method. The identified terms are then transliterated. We present an SMT-based transliteration model trained with a parallel corpus extracted from Wikipedia using a fairly simple method which requires minimal knowledge. The correct result is produced in more than 76% of the cases, and in 92% of the instances it is one of the top-5 results. We also demonstrate a small improvement in the performance of a Hebrew-to-English MT system that uses our transliteration module.

Original languageEnglish
Title of host publicationEACL 2009 - 12th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings
PublisherAssociation for Computational Linguistics (ACL)
Pages433-441
Number of pages9
ISBN (Print)9781932432169
DOIs
StatePublished - 2009
Event12th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2009 - Athens, Greece
Duration: 30 Mar 20093 Apr 2009

Publication series

NameEACL 2009 - 12th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings

Conference

Conference12th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2009
Country/TerritoryGreece
CityAthens
Period30/03/093/04/09

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Lightly supervised transliteration for machine translation'. Together they form a unique fingerprint.

Cite this