Extraction of multi-word expressions from small parallel corpora

Yulia Tsvetkov, Shuly Wintner

Research output: Contribution to conferencePaperpeer-review

Abstract

We present a general methodology for extracting multi-word expressions (of various types), along with their translations, from small parallel corpora. We automatically align the parallel corpus and focus on misalignments; these typically indicate expressions in the source language that are translated to the target in a noncompositional way. We then use a large monolingual corpus to rank and filter the results. Evaluation of the quality of the extraction algorithm reveals significant improvements over naïve alignment-based methods. External evaluation shows an improvement in the performance of machine translation that uses the extracted dictionary.

Original languageEnglish
Pages1256-1264
Number of pages9
StatePublished - 2010
Event23rd International Conference on Computational Linguistics, Coling 2010 - Beijing, China
Duration: 23 Aug 201027 Aug 2010

Conference

Conference23rd International Conference on Computational Linguistics, Coling 2010
Country/TerritoryChina
CityBeijing
Period23/08/1027/08/10

ASJC Scopus subject areas

  • Language and Linguistics
  • Computational Theory and Mathematics
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Extraction of multi-word expressions from small parallel corpora'. Together they form a unique fingerprint.

Cite this