We tackle a major challenge of information filtering on social media (SM): rather than address the general question “what are people talking about on SM?”, we consider a finer question, “what are. talking about on SM?”, where. stands for some subpopulation of SM users of interest. We take a set expansion approach, where a seed of example members of the target subpopulation is initially defined, and additional SM users who belong to that subpopulation are identified, thus enabling the effective tracking of relevant information that pertains to that subpopulation on SM. Specifically, the Personalized PageRank (PPR) random walk measure is iteratively applied to detect additional members of the subpopulation based on their structural similarity to the seed set within the social media graph. There are several main contributions of this work. We outline Splash PPR, an efficient distributed computation of PPR adapted for potentially large seed sets and very large SM graphs. Using Splash PPR, we examine and tune graph representations towards the retrieval of two subpopulations from Twitter, namely human rights Activists, and Machine Learning practitioners. We believe this work is first to introduce and evaluate a generic framework for subpopulation identification at scale.
Bibliographical noteFunding Information:
We wish to thank the reviewers for their useful comments. This work was supported by the Infomedia Magnet grant of the Israeli Ministry of Economy.
© 2020 Elsevier Inc.
- Cloud computing
- Event detection
- Personalized pagerank
- Set expansion
- Social network
- Subpopulation identification
ASJC Scopus subject areas
- Control and Systems Engineering
- Theoretical Computer Science
- Computer Science Applications
- Information Systems and Management
- Artificial Intelligence