Abstract
We tackle a major challenge of information filtering on social media (SM): rather than address the general question “what are people talking about on SM?”, we consider a finer question, “what are. talking about on SM?”, where. stands for some subpopulation of SM users of interest. We take a set expansion approach, where a seed of example members of the target subpopulation is initially defined, and additional SM users who belong to that subpopulation are identified, thus enabling the effective tracking of relevant information that pertains to that subpopulation on SM. Specifically, the Personalized PageRank (PPR) random walk measure is iteratively applied to detect additional members of the subpopulation based on their structural similarity to the seed set within the social media graph. There are several main contributions of this work. We outline Splash PPR, an efficient distributed computation of PPR adapted for potentially large seed sets and very large SM graphs. Using Splash PPR, we examine and tune graph representations towards the retrieval of two subpopulations from Twitter, namely human rights Activists, and Machine Learning practitioners. We believe this work is first to introduce and evaluate a generic framework for subpopulation identification at scale.
Original language | English |
---|---|
Pages (from-to) | 92-112 |
Number of pages | 21 |
Journal | Information Sciences |
Volume | 528 |
DOIs | |
State | Published - Aug 2020 |
Bibliographical note
Publisher Copyright:© 2020 Elsevier Inc.
Keywords
- Cloud computing
- Event detection
- Personalized pagerank
- Set expansion
- Social network
- Subpopulation identification
ASJC Scopus subject areas
- Software
- Control and Systems Engineering
- Theoretical Computer Science
- Computer Science Applications
- Information Systems and Management
- Artificial Intelligence