Sampling large graphs for anticipatory analytics

Lauren Edwards, Luke Johnson, Maja Milosavljevic, Vijay Gadepally, Benjamin Alexandre Miller

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review


The characteristics of Big Data - often dubbed the 3V's for volume, velocity, and variety - will continue to outpace the ability of computational systems to process, store, and transmit meaningful results. Traditional techniques for dealing with large datasets often include the purchase of larger systems, greater human-in-the-loop involvement, or more complex algorithms. We are investigating the use of sampling to mitigate these challenges, specifically sampling large graphs. Often, large datasets can be represented as graphs where data entries may be edges, and vertices may be attributes of the data. In particular, we present the results of sampling for the task of link prediction. Link prediction is a process to estimate the probability of a new edge forming between two vertices of a graph, and it has numerous application areas in understanding social or biological networks. In this paper we propose a series of techniques for the sampling of large datasets. In order to quantify the effect of these techniques, we present the quality of link prediction tasks on sampled graphs, and the time saved in calculating link prediction statistics on these sampled graphs.

Original languageEnglish
Title of host publication2015 IEEE High Performance Extreme Computing Conference, HPEC 2015
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781467392860
StatePublished - 9 Nov 2015
Externally publishedYes
EventIEEE High Performance Extreme Computing Conference, HPEC 2015 - Waltham, United States
Duration: 15 Sep 201517 Sep 2015

Publication series

Name2015 IEEE High Performance Extreme Computing Conference, HPEC 2015


ConferenceIEEE High Performance Extreme Computing Conference, HPEC 2015
Country/TerritoryUnited States

Bibliographical note

Publisher Copyright:
© 2015 IEEE.

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Computer Science Applications
  • Information Systems


Dive into the research topics of 'Sampling large graphs for anticipatory analytics'. Together they form a unique fingerprint.

Cite this