Towards Hate Speech Detection at Large via Deep Generative Modeling

Tomer Wullach, Amir Adler, Einat Minkov

Research output: Contribution to journalArticlepeer-review

Abstract

Hate speech detection is a critical problem in social media, being often accused for enabling the spread of hatred and igniting violence. Hate speech detection requires overwhelming computing resources for online monitoring as well as thousands of human experts for daily screening of suspected posts or tweets. Recently, deep learning (DL)-based solutions have been proposed for hate speech detection, using modest-sized datasets of few thousands of sequences. While these methods perform well on the specific datasets, their ability to generalize to new hate speech sequences is limited. Being a data-driven approach, it is known that DL surpasses other methods whenever scale-up in trainset size and diversity is achieved. Therefore, we first present a dataset of 1 million hate and nonhate sequences, produced by a deep generative model. We further utilize the generated data to train a well-studied DL detector, demonstrating significant performance improvements across five hate speech datasets.

Original languageEnglish
Article number9238420
Pages (from-to)48-57
Number of pages10
JournalIEEE Internet Computing
Volume25
Issue number2
DOIs
StatePublished - 1 Mar 2021

Bibliographical note

Publisher Copyright:
© 1997-2012 IEEE.

ASJC Scopus subject areas

  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Towards Hate Speech Detection at Large via Deep Generative Modeling'. Together they form a unique fingerprint.

Cite this