Abstract
Exponential growth of digital data has introduced massively-parallel systems, special orchestration layers, and new scale-out applications. While recent works suggest characteristics of scale-out workloads are different from those of traditional ones, their root causes are not understood. Such understanding is extremely important to improve efficiency; even a 1% performance gain for a core can have a large impact on the datacenter as a whole. This paper studies the characteristics of a Big Data Analytics (BDA) workload on a modern cloud server. It is intentionally focused on a single workload-platform in order to enable deep-dive analysis that aims to understand the root causes of the CPU bottlenecks which this paper identify. We choose the Data Analytics benchmark from CloudSuite[1] as a representative of a growing family of important applications. This paper describes a customization of a comprehensive threefold analysis method. The method consists of a System level, where sensitivity to system parameters is examined, as well as Application and Architectural levels; where bottlenecks are attributed back to the application and runtime codes, respectively. The paper also adopts a proof-by-optimization approach to prove bottlenecks' validity. Overall, 65% net speedup is measured with significant power reduction. The paper reveals that BDA workloads suffer from overheads related to managing the data rather than accessing the data. For example, Hash index lookup is found to be a key performance limiter. Inefficiencies leading to such unexpected behavior are demonstrated, including JVM selection and heavily unoptimized application code, both of which have a big impact. Suboptimal microarchitecture areas are demonstrated too, in addition to programming styles that limit exploitation of upcoming JVM and CPU parallelization features.
Original language | English |
---|---|
Title of host publication | IISWC 2014 - IEEE International Symposium on Workload Characterization |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 202-211 |
Number of pages | 10 |
ISBN (Electronic) | 9781479964536 |
DOIs | |
State | Published - 11 Dec 2014 |
Event | 2014 IEEE International Symposium on Workload Characterization, IISWC 2014 - Raleigh, United States Duration: 26 Oct 2014 → 28 Oct 2014 |
Publication series
Name | IISWC 2014 - IEEE International Symposium on Workload Characterization |
---|
Conference
Conference | 2014 IEEE International Symposium on Workload Characterization, IISWC 2014 |
---|---|
Country/Territory | United States |
City | Raleigh |
Period | 26/10/14 → 28/10/14 |
Bibliographical note
Publisher Copyright:© 2014 IEEE.
ASJC Scopus subject areas
- Computer Science Applications
- Hardware and Architecture
- Software
- Electrical and Electronic Engineering
- Control and Systems Engineering