Abstract
Data balancing is a known technique for improving the performance of classification tasks. In this work we define a novel balancing-via-generation framework termed BalaGen. BalaGen consists of a flexible balancing policy coupled with a text generation mechanism. Combined, these two techniques can be used to augment a dataset for more balanced distribution. We evaluate BalaGen on three publicly available semantic utterance classification (SUC) datasets. One of these is a new COVID-19 Q&A dataset published here for the first time. Our work demonstrates that optimal balancing policies can significantly improve classifier performance, while augmenting just part of the classes and under-sampling others. Furthermore, capitalizing on the advantages of balancing, we show its usefulness in all relevant BalaGen framework components. We validate the superiority of BalaGen on ten semantic utterance datasets taken from real-life goal-oriented dialogue systems. Based on our results we encourage using data balancing prior to training for text classification tasks.
Original language | English |
---|---|
Title of host publication | Findings of the Association for Computational Linguistics Findings of ACL |
Subtitle of host publication | EMNLP 2020 |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 1440-1452 |
Number of pages | 13 |
ISBN (Electronic) | 9781952148903 |
State | Published - 2020 |
Externally published | Yes |
Event | Findings of the Association for Computational Linguistics, ACL 2020: EMNLP 2020 - Virtual, Online Duration: 16 Nov 2020 → 20 Nov 2020 |
Publication series
Name | Findings of the Association for Computational Linguistics Findings of ACL: EMNLP 2020 |
---|
Conference
Conference | Findings of the Association for Computational Linguistics, ACL 2020: EMNLP 2020 |
---|---|
City | Virtual, Online |
Period | 16/11/20 → 20/11/20 |
Bibliographical note
Publisher Copyright:©2020 Association for Computational Linguistics
ASJC Scopus subject areas
- Information Systems
- Computer Science Applications
- Computational Theory and Mathematics