A general method for creating a bilingual transliteration dictionary

Amit Kirschenbaum, Shuly Wintner

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Transliteration is the rendering in one language of terms from another language (and, possibly, another writing system), approximating spelling and/or phonetic equivalents between the two languages. A transliteration dictionary is a crucial resource for a variety of natural language applications, most notably machine translation. We describe a general method for creating bilingual transliteration dictionaries from Wikipedia article titles. The method can be applied to any language pair with Wikipedia presence, independently of the writing systems involved, and requires only a single simple resource that can be provided by any literate bilingual speaker. It was successfully applied to extract a Hebrew-English transliteration dictionary which, when incorporated in a machine translation system, indeed improved its performance.

Original languageEnglish
Title of host publicationProceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010
EditorsDaniel Tapias, Irene Russo, Olivier Hamon, Stelios Piperidis, Nicoletta Calzolari, Khalid Choukri, Joseph Mariani, Helene Mazo, Bente Maegaard, Jan Odijk, Mike Rosner
PublisherEuropean Language Resources Association (ELRA)
Pages273-276
Number of pages4
ISBN (Electronic)2951740867, 9782951740860
StatePublished - 2010
Event7th International Conference on Language Resources and Evaluation, LREC 2010 - Valletta, Malta
Duration: 17 May 201023 May 2010

Publication series

NameProceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010

Conference

Conference7th International Conference on Language Resources and Evaluation, LREC 2010
Country/TerritoryMalta
CityValletta
Period17/05/1023/05/10

Bibliographical note

Funding Information:
We wish to thank Gennadi Lembersky for his help in integrating our work into the MT system, as well as to Erik Peterson and Alon Lavie for providing the code for extracting bilingual article titles from Wikipedia. This research was supported by THE ISRAEL SCIENCE FOUNDATION (grant No. 137/06); by the Israel Internet Association; by the Knowledge Center for Processing Hebrew; and by the Caesarea Rothschild Institute for Interdisciplinary Application of Computer Science at the University of Haifa.

ASJC Scopus subject areas

  • Education
  • Library and Information Sciences
  • Linguistics and Language
  • Language and Linguistics

Fingerprint

Dive into the research topics of 'A general method for creating a bilingual transliteration dictionary'. Together they form a unique fingerprint.

Cite this