Abstract
The characteristics of Big Data - often dubbed the 3V's for volume, velocity, and variety - will continue to outpace the ability of computational systems to process, store, and transmit meaningful results. Traditional techniques for dealing with large datasets often include the purchase of larger systems, greater human-in-the-loop involvement, or more complex algorithms. We are investigating the use of sampling to mitigate these challenges, specifically sampling large graphs. Often, large datasets can be represented as graphs where data entries may be edges, and vertices may be attributes of the data. In particular, we present the results of sampling for the task of link prediction. Link prediction is a process to estimate the probability of a new edge forming between two vertices of a graph, and it has numerous application areas in understanding social or biological networks. In this paper we propose a series of techniques for the sampling of large datasets. In order to quantify the effect of these techniques, we present the quality of link prediction tasks on sampled graphs, and the time saved in calculating link prediction statistics on these sampled graphs.
Original language | English |
---|---|
Title of host publication | 2015 IEEE High Performance Extreme Computing Conference, HPEC 2015 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
ISBN (Electronic) | 9781467392860 |
DOIs | |
State | Published - 9 Nov 2015 |
Externally published | Yes |
Event | IEEE High Performance Extreme Computing Conference, HPEC 2015 - Waltham, United States Duration: 15 Sep 2015 → 17 Sep 2015 |
Publication series
Name | 2015 IEEE High Performance Extreme Computing Conference, HPEC 2015 |
---|
Conference
Conference | IEEE High Performance Extreme Computing Conference, HPEC 2015 |
---|---|
Country/Territory | United States |
City | Waltham |
Period | 15/09/15 → 17/09/15 |
Bibliographical note
Publisher Copyright:© 2015 IEEE.
ASJC Scopus subject areas
- Computer Networks and Communications
- Computer Science Applications
- Information Systems