Sampling operations on big data

Vijay Gadepally, Taylor Herr, Luke Johnson, Lauren Milechin, Maja Milosavljevic, Benjamin A. Miller

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The 3Vs - Volume, Velocity and Variety - of Big Data continues to be a large challenge for systems and algorithms designed to store, process and disseminate information for discovery and exploration under real-time constraints. Common signal processing operations such as sampling and filtering, which have been used for decades to compress signals are often undefined in data that is characterized by heterogeneity, high dimensionality, and lack of known structure. In this article, we describe and demonstrate an approach to sample large datasets such as social media data. We evaluate the effect of sampling on a common predictive analytic: link prediction. Our results indicate that greatly sampling a dataset can still yield meaningful link prediction results.

Original languageEnglish
Title of host publicationConference Record of the 49th Asilomar Conference on Signals, Systems and Computers, ACSSC 2015
EditorsMichael B. Matthews
PublisherIEEE Computer Society
Pages1515-1519
Number of pages5
ISBN (Electronic)9781467385763
DOIs
StatePublished - 26 Feb 2016
Externally publishedYes
Event49th Asilomar Conference on Signals, Systems and Computers, ACSSC 2015 - Pacific Grove, United States
Duration: 8 Nov 201511 Nov 2015

Publication series

NameConference Record - Asilomar Conference on Signals, Systems and Computers
Volume2016-February
ISSN (Print)1058-6393

Conference

Conference49th Asilomar Conference on Signals, Systems and Computers, ACSSC 2015
Country/TerritoryUnited States
CityPacific Grove
Period8/11/1511/11/15

Bibliographical note

Publisher Copyright:
© 2015 IEEE.

ASJC Scopus subject areas

  • Signal Processing
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Sampling operations on big data'. Together they form a unique fingerprint.

Cite this