Abstract
We present a general methodology for extracting multi-word expressions (of various types), along with their translations, from small parallel corpora. We automatically align the parallel corpus and focus on misalignments; these typically indicate expressions in the source language that are translated to the target in a noncompositional way. We then use a large monolingual corpus to rank and filter the results. Evaluation of the quality of the extraction algorithm reveals significant improvements over naïve alignment-based methods. External evaluation shows an improvement in the performance of machine translation that uses the extracted dictionary.
Original language | English |
---|---|
Pages | 1256-1264 |
Number of pages | 9 |
State | Published - 2010 |
Event | 23rd International Conference on Computational Linguistics, Coling 2010 - Beijing, China Duration: 23 Aug 2010 → 27 Aug 2010 |
Conference
Conference | 23rd International Conference on Computational Linguistics, Coling 2010 |
---|---|
Country/Territory | China |
City | Beijing |
Period | 23/08/10 → 27/08/10 |
ASJC Scopus subject areas
- Language and Linguistics
- Computational Theory and Mathematics
- Linguistics and Language