Abstract
The ability to compare the semantic similarity between text corpora is important in a variety of natural language processing applications. However, standard methods for evaluating these metrics have yet to be established. We propose a set of automatic and interpretable measures for assessing the characteristics of corpus-level semantic similarity metrics, allowing sensible comparison of their behavior. We demonstrate the effectiveness of our evaluation measures in capturing fundamental characteristics by evaluating them on a collection of classical and state-of-the-art metrics. Our measures revealed that recently-developed metrics are becoming better in identifying semantic distributional mismatch while classical metrics are more sensitive to perturbations in the surface text levels.
| Original language | English |
|---|---|
| Title of host publication | GEM 2022 - 2nd Workshop on Natural Language Generation, Evaluation, and Metrics, Proceedings of the Workshop |
| Publisher | Association for Computational Linguistics (ACL) |
| Pages | 405-416 |
| Number of pages | 12 |
| ISBN (Electronic) | 9781959429128 |
| State | Published - 2022 |
| Externally published | Yes |
| Event | 2nd Workshop on Natural Language Generation, Evaluation, and Metrics, GEM 2022, as part of EMNLP 2022 - Abu Dhabi, United Arab Emirates Duration: 7 Dec 2022 → … |
Publication series
| Name | GEM 2022 - 2nd Workshop on Natural Language Generation, Evaluation, and Metrics, Proceedings of the Workshop |
|---|
Conference
| Conference | 2nd Workshop on Natural Language Generation, Evaluation, and Metrics, GEM 2022, as part of EMNLP 2022 |
|---|---|
| Country/Territory | United Arab Emirates |
| City | Abu Dhabi |
| Period | 7/12/22 → … |
Bibliographical note
Publisher Copyright:© 2022 Association for Computational Linguistics.
ASJC Scopus subject areas
- Computational Theory and Mathematics
- Computer Science Applications
- Information Systems
Fingerprint
Dive into the research topics of 'Measuring the Measuring Tools: An Automatic Evaluation of Semantic Metrics for Text Corpora'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver