Are LLMs Breaking MT Metrics? Results of the WMT24 Metrics Shared Task

Markus Freitag, Nitika Mathur, Daniel Deutsch, Chi Kiu Lo, Eleftherios Avramidis, Ricardo Rei, Brian Thompson, Frédéric Blain, Tom Kocmi, Jiayi Wang, David I. Adelani, Marianna Buchicchio, Chrysoula Zerva, Alon Lavie

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The WMT24 Metrics Shared Task evaluated the performance of automatic metrics for machine translation (MT), with a major focus on LLM-based translations that were generated as part of the WMT24 General MT Shared Task. As LLMs become increasingly popular in MT, it is crucial to determine whether existing evaluation metrics can accurately assess the output of these systems. To provide a robust benchmark for this evaluation, human assessments were collected using Multidimensional Quality Metrics (MQM), continuing the practice from recent years. Furthermore, building on the success of the previous year, a challenge set subtask was included, requiring participants to design contrastive test suites that specifically target a metric’s ability to identify and penalize different types of translation errors. Finally, the meta-evaluation procedure was refined to better reflect real-world usage of MT metrics, focusing on pairwise accuracy at both the system- and segment-levels. We present an extensive analysis on how well metrics perform on three language pairs: English?Spanish (Latin America), Japanese?Chinese, and English?German. The results strongly confirm the results reported last year, that fine-tuned neural metrics continue to perform well, even when used to evaluate LLM-based translation systems.

Original languageEnglish
Title of host publicationWMT 2024 - 9th Conference on Machine Translation, Proceedings of the Conference
EditorsBarry Haddow, Tom Kocmi, Philipp Koehn, Christof Monz
PublisherAssociation for Computational Linguistics
Pages47-81
Number of pages35
ISBN (Electronic)9798891761797
StatePublished - 2024
Externally publishedYes
Event9th Conference on Machine Translation, WMT 2024 - Miami, United States
Duration: 15 Nov 202416 Nov 2024

Publication series

NameConference on Machine Translation - Proceedings
Volume2024-November
ISSN (Electronic)2768-0983

Conference

Conference9th Conference on Machine Translation, WMT 2024
Country/TerritoryUnited States
CityMiami
Period15/11/2416/11/24

Bibliographical note

Publisher Copyright:
©2024 Association for Computational Linguistics.

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Software

Fingerprint

Dive into the research topics of 'Are LLMs Breaking MT Metrics? Results of the WMT24 Metrics Shared Task'. Together they form a unique fingerprint.

Cite this