research-article

Exploring multi-threaded Java application performance on multicore hardware

Authors:
Jennfer B. Sartor

Ghent University, Ghent, Belgium

Ghent University, Ghent, Belgium
View Profile

,
Lieven Eeckhout

Ghent University, Ghent, Belgium

Ghent University, Ghent, Belgium
View Profile

OOPSLA '12: Proceedings of the ACM international conference on Object oriented programming systems languages and applicationsOctober 2012Pages 281–296https://doi.org/10.1145/2384616.2384638

Published:19 October 2012Publication History

OOPSLA '12: Proceedings of the ACM international conference on Object oriented programming systems languages and applications

Pages 281–296

ABSTRACT

While there have been many studies of how to schedule applications to take advantage of increasing numbers of cores in modern-day multicore processors, few have focused on multi-threaded managed language applications which are prevalent from the embedded to the server domain. Managed languages complicate performance studies because they have additional virtual machine threads that collect garbage and dynamically compile, closely interacting with application threads. Further complexity is introduced as modern multicore machines have multiple sockets and dynamic frequency scaling options, broadening opportunities to reduce both power and running time.

In this paper, we explore the performance of Java applications, studying how best to map application and virtual machine (JVM) threads to a multicore, multi-socket environment. We explore both the cost of separating JVM threads from application threads, and the opportunity to speed up or slow down the clock frequency of isolated threads. We perform experiments with the multi-threaded DaCapo benchmarks and pseudojbb2005 running on the Jikes Research Virtual Machine, on a dual-socket, 8-core Intel Nehalem machine to reveal several novel, and sometimes counter-intuitive, findings. We believe these insights are a first but important step towards understanding and optimizing managed language performance on modern hardware.

References

L. A. Barroso and U. Hölzle. The case for energy-proportional systems. IEEE Computer, 40: 33--37, Dec. 2007. Google ScholarDigital Library
S. M. Blackburn and K. S. McKinley. Immix: A mark-region garbage collector with space efficiency, fast collection, and mutator locality. In Programming Language Design and Implementation (PLDI), pages 22--32, Tuscon, AZ, June 2008. Google ScholarDigital Library
S. M. Blackburn, M. Hirzel, R. Garner, and D. Stefanović. pjbb2005: The pseudojbb benchmark. URL http://users.cecs.anu.edu.au/ steveb/research/research-infrastructure/pjbb2005.Google Scholar
S. M. Blackburn, R. Garner, C. Hoffman, A. M. Khan, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanović, T. VanDrunen, D. von Dincklage, and B. Wiedermann. The DaCapo benchmarks: Java benchmarking development and analysis. In ACM SIGPLAN Conference on Object-Oriented Programing, Systems, Languages, and Applications (OOPSLA), pages 169--190, Oct. 2006. Google ScholarDigital Library
S. M. Blackburn, K. S. McKinley, R. Garner, C. Hoffman, A. M. Khan, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanović, T. VanDrunen, D. von Dincklage, and B. Wiedermann. Wake up and smell the coffee: Evaluation methodology for the 21st century. Communications of the ACM, 51 (8): 83--89, Aug. 2008. Google ScholarDigital Library
T. Cao, S. M. Blackburn, T. Gao, and K. S. McKinley. The yin and yang of power and performance for asymmetric hardware and managed software. In The 39th International Symposium on Computer Architecture (ISCA), pages 225--236, June 2012. Google ScholarDigital Library
R. H. Dennard, F. H. Gaensslen, V. L. Rideout, E. Bassous, and A. R. LeBlanc. Design of ion-implanted mosfet's with very small physical dimensions. IEEE Journal of Solid-State Circuits, Oct 1974.Google ScholarCross Ref
J. Dorsey, S. Searles, M. Ciraula, S. Johnson, N. Bujanos, D. Wu, M. Braganza, S. Meyers, E. Fang, and R. Kumar. An integrated quad-core Opteron processor. In Proceedings of the International Solid State Circuits Conference (ISSCC), pages 102--103, Feb. 2007.Google ScholarCross Ref
H. Esmaeilzadeh, E. R. Blem, R. S. Amant, K. Sankaralingam, and D. Burger. Dark silicon and the end of multicore scaling. In 38th International Symposium on Computer Architecture (ISCA), pages 365--376, June 2011. Google ScholarDigital Library
H. Esmaeilzadeh, T. Cao, Y. Xi, S. M. Blackburn, and K. S. McKinley. Looking back on the language and hardware revolutions: Measured power, performance, and scaling. In Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 319--332, June 2011. Google ScholarDigital Library
A. Georges, D. Buytaert, and L. Eeckhout. Statistically rigorous Java performance evaluation. In Proceedings of the Annual ACM SIGPLAN Conference on Object-Oriented Programming, Languages, Applications and Systems (OOPSLA), pages 57--76, Oct. 2007. Google ScholarDigital Library
C.-H. Hsu and U. Kremer. The design, implementation, and evaluation of a compiler algorithm for CPU energy reduction. In Proceedings of the International Symposium on Programming Language Design and Implementation (PLDI), pages 38--48, June 2003. Google ScholarDigital Library
S. Hu and L. K. John. Impact of virtual execution environments on processor energy consumption and hardware adaptation. In International Conference on Virtual Execution Environments (VEE), pages 100--110, June 2006. Google ScholarDigital Library
C. J. Hughes, J. Srinivasan, and S. V. Adve. Saving energy with architectural and frequency adaptations for multimedia applications. In Proceedings of the 34th Annual International Symposium on Microarchitecture (MICRO), pages 250--261, Dec. 2001. Google ScholarDigital Library
Intel Coorporation. Intel turbo boost technology in Intel core microarchitecture (Nehalem) based processors, Nov 2008.Google Scholar
C. Isci, A. Buyuktosunoglu, C.-Y. Cher, P. Bose, and M. Martonosi. An analysis of efficient multi-core global power management policies: Maximizing performance for a given power budget. In Proceedings of the International Symposium on Microarchitecture (MICRO), pages 347--358, Dec. 2006. Google ScholarDigital Library
C. Isci, G. Contreras, and M. Martonosi. Live, runtime phase monitoring and prediction on real systems and application to dynamic power management. In Proceedings of the International Symposium on Microarchitecture (MICRO), pages 359--370, Dec. 2006. Google ScholarDigital Library
W. Kim, M. S. Gupta, G.-Y. Wei, and D. Brooks. System level analysis of fast, per-core DVFS using on-chip switching regulators. In Proceedings of the International Symposium on High-Performance Computer Architecture (HPCA), pages 123--134, Feb. 2008.Google Scholar
G. E. Moore. Readings in computer architecture. chapter Cramming more components onto integrated circuits, pages 56--59. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2000. Google ScholarDigital Library
Y. Seeley. JIRA issue LUCENE-1800: QueryParser should use reusable token streams, 2009. URL https://issues.apache.org/jira/browse/LUCENE-1800.Google Scholar
G. Semeraro, D. H. Albonesi, S. G. Dropsho, G. Magklis, S. Dwarkadas, and M. L. Scott. Dynamic frequency and voltage control for a multiple clock domain microarchitecture. In Proceedings of the International Symposium on Microarchitecture (MICRO), pages 356--367, Nov. 2002. Google ScholarDigital Library
TIOBE Software. TIOBE programming community index, 2011. http://tiobe.com/tpci.html.Google Scholar
Q. Wu, V. J. Reddi, Y. Wu, J. Lee, D. Connors, D. Brooks, M. Martonosi, and D. W. Clark. A dynamic compilation framework for controlling microprocessor energy and performance. In Proceedings of the International Symposium on Microarchitecture (MICRO), pages 271--282, Nov. 2005. Google ScholarDigital Library
F. Xie, M. Martonosi, and S. Malik. Compile-time dynamic voltage scaling settings: Opportunities and limits. In Proceedings of the International Symposium on Programming Language Design and Implementation (PLDI), pages 49--62, June 2003. Google ScholarDigital Library
X. Yang, S. Blackburn, D. Frampton, J. Sartor, and K. McKinley. Why nothing matters: The impact of zeroing. In Proceedings of the 2011 ACM International Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA), pages 307--324, Oct 2011. Google ScholarDigital Library

Index Terms

Exploring multi-threaded Java application performance on multicore hardware
1. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Runtime environments
  2. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Memory management
        Garbage collection

Recommendations

Exploring multi-threaded Java application performance on multicore hardware
OOPSLA '12

While there have been many studies of how to schedule applications to take advantage of increasing numbers of cores in modern-day multicore processors, few have focused on multi-threaded managed language applications which are prevalent from the ...
Read More
Efficient compilation of fine-grained SPMD-threaded programs for multicore CPUs
CGO '10: Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization

In this paper we describe techniques for compiling fine-grained SPMD-threaded programs, expressed in programming models such as OpenCL or CUDA, to multicore execution platforms. Programs developed for manycore processors typically express finer thread-...
Read More
Analysis of computing and energy performance of multicore, NUMA, and manycore platforms for an irregular application
IA³ '13: Proceedings of the 3rd Workshop on Irregular Applications: Architectures and Algorithms

The exponential growth in processor performance seems to have reached a turning point. Nowadays, energy efficiency is as important as performance and has become a critical aspect to the development of scalable systems. These strict energy constraints ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
OOPSLA '12: Proceedings of the ACM international conference on Object oriented programming systems languages and applications
October 2012
1052 pages
ISBN:9781450315616
DOI:10.1145/2384616
General Chair:
Gary T. Leavens
University of Central Florida
,
Program Chair:
Matthew B. Dwyer
University of Nebraska - Lincoln
ACM SIGPLAN Notices Volume 47, Issue 10
OOPSLA '12
October 2012
1011 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/2398857
Issue’s Table of Contents
Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 19 October 2012
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
java
managed languages
multicore
performance analysis
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate268of1,244submissions,22%
Upcoming Conference
SPLASH '24

Sponsor:

sigplan

ACM SIGPLAN International Conference on Systems, Programming, Languages, and Applications: Software for Humanity

October 20 - 25, 2024

Pasadena , CA , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 31
  Total Citations
  View Citations
- 432
  Total Downloads
- Downloads (Last 12 months)9
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Exploring multi-threaded Java application performance on multicore hardware

OOPSLA '12: Proceedings of the ACM international conference on Object oriented programming systems languages and applications

ABSTRACT

References

Cited By

Index Terms

Recommendations

Exploring multi-threaded Java application performance on multicore hardware

Efficient compilation of fine-grained SPMD-threaded programs for multicore CPUs

Analysis of computing and energy performance of multicore, NUMA, and manycore platforms for an irregular application