The characteristics of Big Data - often dubbed the 3V's for volume, velocity, and variety - will continue to outpace the ability of computational systems to process, store, and transmit meaningful results. Traditional techniques for dealing with large datasets often include the purchase of larger systems, greater human-in-the-loop involvement, or more complex algorithms. We are investigating the use of sampling to mitigate these challenges, specifically sampling large graphs. Often, large datasets can be represented as graphs where data entries may be edges, and vertices may be attributes of the data. In particular, we present the results of sampling for the task of link prediction. Link prediction is a process to estimate the probability of a new edge forming between two vertices of a graph, and it has numerous application areas in understanding social or biological networks. In this paper we propose a series of techniques for the sampling of large datasets. In order to quantify the effect of these techniques, we present the quality of link prediction tasks on sampled graphs, and the time saved in calculating link prediction statistics on these sampled graphs.
|Title of host publication||2015 IEEE High Performance Extreme Computing Conference, HPEC 2015|
|Publisher||Institute of Electrical and Electronics Engineers Inc.|
|State||Published - 9 Nov 2015|
|Event||IEEE High Performance Extreme Computing Conference, HPEC 2015 - Waltham, United States|
Duration: 15 Sep 2015 → 17 Sep 2015
|Name||2015 IEEE High Performance Extreme Computing Conference, HPEC 2015|
|Conference||IEEE High Performance Extreme Computing Conference, HPEC 2015|
|Period||15/09/15 → 17/09/15|
Bibliographical notePublisher Copyright:
© 2015 IEEE.
ASJC Scopus subject areas
- Computer Networks and Communications
- Computer Science Applications
- Information Systems