The hitchhiker's guide to testing statistical significance in natural language processing

Rotem Dror, Gili Baumer, Segev Shlomov, Roi Reichart

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Statistical significance testing is a standard statistical tool designed to ensure that experimental results are not coincidental. In this opinion/theoretical paper we discuss the role of statistical significance testing in Natural Language Processing (NLP) research. We establish the fundamental concepts of significance testing and discuss the specific aspects of NLP tasks, experimental setups and evaluation measures that affect the choice of significance tests in NLP research. Based on this discussion, we propose a simple practical protocol for statistical significance test selection in NLP setups and accompany this protocol with a brief survey of the most relevant tests. We then survey recent empirical papers published in ACL and TACL during 2017 and show that while our community assigns great value to experimental results, statistical significance testing is often ignored or misused.

Original languageEnglish
Title of host publicationACL 2018 - 56th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers)
PublisherAssociation for Computational Linguistics (ACL)
Pages1383-1392
Number of pages10
ISBN (Electronic)9781948087322
DOIs
StatePublished - 2018
Externally publishedYes
Event56th Annual Meeting of the Association for Computational Linguistics, ACL 2018 - Melbourne, Australia
Duration: 15 Jul 201820 Jul 2018

Publication series

NameACL 2018 - 56th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers)
Volume1

Conference

Conference56th Annual Meeting of the Association for Computational Linguistics, ACL 2018
Country/TerritoryAustralia
CityMelbourne
Period15/07/1820/07/18

Bibliographical note

Publisher Copyright:
© 2018 Association for Computational Linguistics

ASJC Scopus subject areas

  • Software
  • Computational Theory and Mathematics

Fingerprint

Dive into the research topics of 'The hitchhiker's guide to testing statistical significance in natural language processing'. Together they form a unique fingerprint.

Cite this