Abstract
Based on recent advances in natural language modeling and those in text generation capabilities, we propose a novel data augmentation method for text classification tasks. We use a powerful pre-Trained neural network model to artificially synthesize new labeled data for supervised learning. We mainly focus on cases with scarce labeled data. Our method, referred to as language-model-based data augmentation (LAMBADA), involves fine-Tuning a state-of-The-Art language generator to a specific task through an initial training phase on the existing (usually small) labeled data. Using the fine-Tuned model and given a class label, new sentences for the class are generated. Our process then filters these new sentences by using a classifier trained on the original data. In a series of experiments, we show that LAMBADA improves classifiers performance on a variety of datasets. Moreover, LAMBADA significantly improves upon the state-of-The-Art techniques for data augmentation, specifically those applicable to text classification tasks with little data.
Original language | English |
---|---|
Title of host publication | AAAI 2020 - 34th AAAI Conference on Artificial Intelligence |
Publisher | AAAI Press |
Pages | 7383-7390 |
Number of pages | 8 |
ISBN (Electronic) | 9781577358350 |
State | Published - 2020 |
Event | 34th AAAI Conference on Artificial Intelligence, AAAI 2020 - New York, United States Duration: 7 Feb 2020 → 12 Feb 2020 |
Publication series
Name | AAAI 2020 - 34th AAAI Conference on Artificial Intelligence |
---|
Conference
Conference | 34th AAAI Conference on Artificial Intelligence, AAAI 2020 |
---|---|
Country/Territory | United States |
City | New York |
Period | 7/02/20 → 12/02/20 |
Bibliographical note
Publisher Copyright:© 2020 The Twenty-Fifth AAAI/SIGAI Doctoral Consortium (AAAI-20). All Rights Reserved.
ASJC Scopus subject areas
- Artificial Intelligence