Identification of topical subpopulations on social media

Research output: Contribution to journalArticlepeer-review


We tackle a major challenge of information filtering on social media (SM): rather than address the general question “what are people talking about on SM?”, we consider a finer question, “what are. talking about on SM?”, where. stands for some subpopulation of SM users of interest. We take a set expansion approach, where a seed of example members of the target subpopulation is initially defined, and additional SM users who belong to that subpopulation are identified, thus enabling the effective tracking of relevant information that pertains to that subpopulation on SM. Specifically, the Personalized PageRank (PPR) random walk measure is iteratively applied to detect additional members of the subpopulation based on their structural similarity to the seed set within the social media graph. There are several main contributions of this work. We outline Splash PPR, an efficient distributed computation of PPR adapted for potentially large seed sets and very large SM graphs. Using Splash PPR, we examine and tune graph representations towards the retrieval of two subpopulations from Twitter, namely human rights Activists, and Machine Learning practitioners. We believe this work is first to introduce and evaluate a generic framework for subpopulation identification at scale.

Original languageEnglish
Pages (from-to)92-112
Number of pages21
JournalInformation Sciences
StatePublished - Aug 2020

Bibliographical note

Publisher Copyright:
© 2020 Elsevier Inc.


  • Cloud computing
  • Event detection
  • Personalized pagerank
  • Set expansion
  • Social network
  • Subpopulation identification

ASJC Scopus subject areas

  • Software
  • Control and Systems Engineering
  • Theoretical Computer Science
  • Computer Science Applications
  • Information Systems and Management
  • Artificial Intelligence


Dive into the research topics of 'Identification of topical subpopulations on social media'. Together they form a unique fingerprint.

Cite this