Benchmarking Answer Verification Methods for Question Answering-Based Summarization Evaluation Metrics

Daniel Deutsch, Dan Roth

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Question answering-based summarization evaluation metrics must automatically determine whether the QA model's prediction is correct or not, a task known as answer verification. In this work, we benchmark the lexical answer verification methods which have been used by current QA-based metrics as well as two more sophisticated text comparison methods, BERTScore and LERC. We find that LERC out-performs the other methods in some settings while remaining statistically indistinguishable from lexical overlap in others. However, our experiments reveal that improved verification performance does not necessarily translate to overall QA-based metric quality: In some scenarios, using a worse verification method - or using none at all - has comparable performance to using the best verification method, a result that we attribute to properties of the datasets.

Original languageEnglish
Title of host publicationACL 2022 - 60th Annual Meeting of the Association for Computational Linguistics, Findings of ACL 2022
EditorsSmaranda Muresan, Preslav Nakov, Aline Villavicencio
PublisherAssociation for Computational Linguistics (ACL)
Pages3759-3765
Number of pages7
ISBN (Electronic)9781955917254
DOIs
StatePublished - 2022
Externally publishedYes
Event60th Annual Meeting of the Association for Computational Linguistics, ACL 2022 - Dublin, Ireland
Duration: 22 May 202227 May 2022

Publication series

NameProceedings of the Annual Meeting of the Association for Computational Linguistics
ISSN (Print)0736-587X

Conference

Conference60th Annual Meeting of the Association for Computational Linguistics, ACL 2022
Country/TerritoryIreland
CityDublin
Period22/05/2227/05/22

Bibliographical note

Publisher Copyright:
© 2022 Association for Computational Linguistics.

ASJC Scopus subject areas

  • Computer Science Applications
  • Linguistics and Language
  • Language and Linguistics

Fingerprint

Dive into the research topics of 'Benchmarking Answer Verification Methods for Question Answering-Based Summarization Evaluation Metrics'. Together they form a unique fingerprint.

Cite this