Understanding the Extent to which Content Quality Metrics Measure the Information Quality of Summaries

Daniel Deutsch, Dan Roth

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Reference-based metrics such as ROUGE or BERTScore evaluate the content quality of a summary by comparing the summary to a reference. Ideally, this comparison should measure the summary’s information quality by calculating how much information the summaries have in common. In this work, we analyze the token alignments used by ROUGE and BERTScore to compare summaries and argue that their scores largely cannot be interpreted as measuring information overlap. Rather, they are better estimates of the extent to which the summaries discuss the same topics. Further, we provide evidence that this result holds true for many other summarization evaluation metrics. The consequence of this result is that the most frequently used summarization evaluation metrics do not align with the community’s research goal, to generate summaries with high-quality information. However, we conclude by demonstrating that a recently proposed metric, QAEval, which scores summaries using question-answering, appears to better capture information quality than current evaluations, highlighting a direction for future research.

Original languageEnglish
Title of host publicationCoNLL 2021 - 25th Conference on Computational Natural Language Learning, Proceedings
EditorsArianna Bisazza, Omri Abend
PublisherAssociation for Computational Linguistics (ACL)
Pages300-309
Number of pages10
ISBN (Electronic)9781955917056
DOIs
StatePublished - 2021
Externally publishedYes
Event25th Conference on Computational Natural Language Learning, CoNLL 2021 - Virtual, Online
Duration: 10 Nov 202111 Nov 2021

Publication series

NameCoNLL 2021 - 25th Conference on Computational Natural Language Learning, Proceedings

Conference

Conference25th Conference on Computational Natural Language Learning, CoNLL 2021
CityVirtual, Online
Period10/11/2111/11/21

Bibliographical note

Publisher Copyright:
© 2021 Association for Computational Linguistics.

ASJC Scopus subject areas

  • Artificial Intelligence
  • Human-Computer Interaction
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Understanding the Extent to which Content Quality Metrics Measure the Information Quality of Summaries'. Together they form a unique fingerprint.

Cite this