Abstract
Understanding the performance of large, complex enterprise-class applications is an important, yet nontrivial task. Methods using hardware performance counters, such as profiling through event-based sampling, are often favored over instrumentation for analyzing such large codes, but rarely provide good accuracy at the instruction level. This work evaluates the accuracy of multiple eventbased sampling techniques and quantifies the impact of a range of improvements suggested in recent years. The evaluation is performed on instances of three modern CPU architectures, using designated kernels and full applications. We conclude that precisely distributed events considerably improve accuracy, with further improvements possible when using Last Branch Records. We also present practical recommendations for hardware architects, tool developers and performance engineers, aimed at improving the quality of results.
Original language | English |
---|---|
Title of host publication | Proceedings of the 2015 USENIX Annual Technical Conference, USENIX ATC 2015 |
Publisher | USENIX Association |
Pages | 541-548 |
Number of pages | 8 |
ISBN (Electronic) | 9781931971225 |
State | Published - 2015 |
Externally published | Yes |
Event | 2015 USENIX Annual Technical Conference, USENIX ATC 2015 - Santa Clara, United States Duration: 8 Jul 2015 → 10 Jul 2015 |
Publication series
Name | Proceedings of the 2015 USENIX Annual Technical Conference, USENIX ATC 2015 |
---|
Conference
Conference | 2015 USENIX Annual Technical Conference, USENIX ATC 2015 |
---|---|
Country/Territory | United States |
City | Santa Clara |
Period | 8/07/15 → 10/07/15 |
Bibliographical note
Publisher Copyright:© 2015 USENIX Annual Technical Conference.
ASJC Scopus subject areas
- General Computer Science