Abstract
The WMT24 Metrics Shared Task evaluated the performance of automatic metrics for machine translation (MT), with a major focus on LLM-based translations that were generated as part of the WMT24 General MT Shared Task. As LLMs become increasingly popular in MT, it is crucial to determine whether existing evaluation metrics can accurately assess the output of these systems. To provide a robust benchmark for this evaluation, human assessments were collected using Multidimensional Quality Metrics (MQM), continuing the practice from recent years. Furthermore, building on the success of the previous year, a challenge set subtask was included, requiring participants to design contrastive test suites that specifically target a metric’s ability to identify and penalize different types of translation errors. Finally, the meta-evaluation procedure was refined to better reflect real-world usage of MT metrics, focusing on pairwise accuracy at both the system- and segment-levels. We present an extensive analysis on how well metrics perform on three language pairs: English?Spanish (Latin America), Japanese?Chinese, and English?German. The results strongly confirm the results reported last year, that fine-tuned neural metrics continue to perform well, even when used to evaluate LLM-based translation systems.
Original language | English |
---|---|
Title of host publication | WMT 2024 - 9th Conference on Machine Translation, Proceedings of the Conference |
Editors | Barry Haddow, Tom Kocmi, Philipp Koehn, Christof Monz |
Publisher | Association for Computational Linguistics |
Pages | 47-81 |
Number of pages | 35 |
ISBN (Electronic) | 9798891761797 |
State | Published - 2024 |
Externally published | Yes |
Event | 9th Conference on Machine Translation, WMT 2024 - Miami, United States Duration: 15 Nov 2024 → 16 Nov 2024 |
Publication series
Name | Conference on Machine Translation - Proceedings |
---|---|
Volume | 2024-November |
ISSN (Electronic) | 2768-0983 |
Conference
Conference | 9th Conference on Machine Translation, WMT 2024 |
---|---|
Country/Territory | United States |
City | Miami |
Period | 15/11/24 → 16/11/24 |
Bibliographical note
Publisher Copyright:©2024 Association for Computational Linguistics.
ASJC Scopus subject areas
- Language and Linguistics
- Human-Computer Interaction
- Software