Deep-dive analysis of the data analytics workload in CloudSuite

Ahmad Yasin, Yosi Ben-Asher, Avi Mendelson

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Exponential growth of digital data has introduced massively-parallel systems, special orchestration layers, and new scale-out applications. While recent works suggest characteristics of scale-out workloads are different from those of traditional ones, their root causes are not understood. Such understanding is extremely important to improve efficiency; even a 1% performance gain for a core can have a large impact on the datacenter as a whole. This paper studies the characteristics of a Big Data Analytics (BDA) workload on a modern cloud server. It is intentionally focused on a single workload-platform in order to enable deep-dive analysis that aims to understand the root causes of the CPU bottlenecks which this paper identify. We choose the Data Analytics benchmark from CloudSuite[1] as a representative of a growing family of important applications. This paper describes a customization of a comprehensive threefold analysis method. The method consists of a System level, where sensitivity to system parameters is examined, as well as Application and Architectural levels; where bottlenecks are attributed back to the application and runtime codes, respectively. The paper also adopts a proof-by-optimization approach to prove bottlenecks' validity. Overall, 65% net speedup is measured with significant power reduction. The paper reveals that BDA workloads suffer from overheads related to managing the data rather than accessing the data. For example, Hash index lookup is found to be a key performance limiter. Inefficiencies leading to such unexpected behavior are demonstrated, including JVM selection and heavily unoptimized application code, both of which have a big impact. Suboptimal microarchitecture areas are demonstrated too, in addition to programming styles that limit exploitation of upcoming JVM and CPU parallelization features.

Original languageEnglish
Title of host publicationIISWC 2014 - IEEE International Symposium on Workload Characterization
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages202-211
Number of pages10
ISBN (Electronic)9781479964536
DOIs
StatePublished - 11 Dec 2014
Event2014 IEEE International Symposium on Workload Characterization, IISWC 2014 - Raleigh, United States
Duration: 26 Oct 201428 Oct 2014

Publication series

NameIISWC 2014 - IEEE International Symposium on Workload Characterization

Conference

Conference2014 IEEE International Symposium on Workload Characterization, IISWC 2014
Country/TerritoryUnited States
CityRaleigh
Period26/10/1428/10/14

Bibliographical note

Publisher Copyright:
© 2014 IEEE.

ASJC Scopus subject areas

  • Computer Science Applications
  • Hardware and Architecture
  • Software
  • Electrical and Electronic Engineering
  • Control and Systems Engineering

Fingerprint

Dive into the research topics of 'Deep-dive analysis of the data analytics workload in CloudSuite'. Together they form a unique fingerprint.

Cite this