Abstract
We address the task of native language identification in the context of social media content, where authors are highly-fluent, advanced nonnative speakers (of English). Using both linguistically-motivated features and the characteristics of the social media outlet, we obtain high accuracy on this challenging task. We provide a detailed analysis of the features that sheds light on differences between native and nonnative speakers, and among nonnative speakers with different backgrounds.
Original language | English |
---|---|
Title of host publication | Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018 |
Editors | Ellen Riloff, David Chiang, Julia Hockenmaier, Jun'ichi Tsujii |
Publisher | Association for Computational Linguistics |
Pages | 3591-3601 |
Number of pages | 11 |
ISBN (Electronic) | 9781948087841 |
State | Published - 2020 |
Event | 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018 - Brussels, Belgium Duration: 31 Oct 2018 → 4 Nov 2018 |
Publication series
Name | Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018 |
---|
Conference
Conference | 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018 |
---|---|
Country/Territory | Belgium |
City | Brussels |
Period | 31/10/18 → 4/11/18 |
Bibliographical note
Funding Information:This research was supported by Grant No. 2017699 from the United States-Israel Binational Science Foundation (BSF) and by Grant No. 1813153 from the United States National Science Foundation (NSF).
Publisher Copyright:
© 2018 Association for Computational Linguistics
ASJC Scopus subject areas
- Computational Theory and Mathematics
- Computer Science Applications
- Information Systems