Abstract
Statistical significance testing is a standard statistical tool designed to ensure that experimental results are not coincidental. In this opinion/theoretical paper we discuss the role of statistical significance testing in Natural Language Processing (NLP) research. We establish the fundamental concepts of significance testing and discuss the specific aspects of NLP tasks, experimental setups and evaluation measures that affect the choice of significance tests in NLP research. Based on this discussion, we propose a simple practical protocol for statistical significance test selection in NLP setups and accompany this protocol with a brief survey of the most relevant tests. We then survey recent empirical papers published in ACL and TACL during 2017 and show that while our community assigns great value to experimental results, statistical significance testing is often ignored or misused.
Original language | English |
---|---|
Title of host publication | ACL 2018 - 56th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers) |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 1383-1392 |
Number of pages | 10 |
ISBN (Electronic) | 9781948087322 |
DOIs | |
State | Published - 2018 |
Externally published | Yes |
Event | 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018 - Melbourne, Australia Duration: 15 Jul 2018 → 20 Jul 2018 |
Publication series
Name | ACL 2018 - 56th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers) |
---|---|
Volume | 1 |
Conference
Conference | 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018 |
---|---|
Country/Territory | Australia |
City | Melbourne |
Period | 15/07/18 → 20/07/18 |
Bibliographical note
Publisher Copyright:© 2018 Association for Computational Linguistics
ASJC Scopus subject areas
- Software
- Computational Theory and Mathematics