ABSTRACT
While there have been many studies of how to schedule applications to take advantage of increasing numbers of cores in modern-day multicore processors, few have focused on multi-threaded managed language applications which are prevalent from the embedded to the server domain. Managed languages complicate performance studies because they have additional virtual machine threads that collect garbage and dynamically compile, closely interacting with application threads. Further complexity is introduced as modern multicore machines have multiple sockets and dynamic frequency scaling options, broadening opportunities to reduce both power and running time.
In this paper, we explore the performance of Java applications, studying how best to map application and virtual machine (JVM) threads to a multicore, multi-socket environment. We explore both the cost of separating JVM threads from application threads, and the opportunity to speed up or slow down the clock frequency of isolated threads. We perform experiments with the multi-threaded DaCapo benchmarks and pseudojbb2005 running on the Jikes Research Virtual Machine, on a dual-socket, 8-core Intel Nehalem machine to reveal several novel, and sometimes counter-intuitive, findings. We believe these insights are a first but important step towards understanding and optimizing managed language performance on modern hardware.
- L. A. Barroso and U. Hölzle. The case for energy-proportional systems. IEEE Computer, 40: 33--37, Dec. 2007. Google ScholarDigital Library
- S. M. Blackburn and K. S. McKinley. Immix: A mark-region garbage collector with space efficiency, fast collection, and mutator locality. In Programming Language Design and Implementation (PLDI), pages 22--32, Tuscon, AZ, June 2008. Google ScholarDigital Library
- S. M. Blackburn, M. Hirzel, R. Garner, and D. Stefanović. pjbb2005: The pseudojbb benchmark. URL http://users.cecs.anu.edu.au/ steveb/research/research-infrastructure/pjbb2005.Google Scholar
- S. M. Blackburn, R. Garner, C. Hoffman, A. M. Khan, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanović, T. VanDrunen, D. von Dincklage, and B. Wiedermann. The DaCapo benchmarks: Java benchmarking development and analysis. In ACM SIGPLAN Conference on Object-Oriented Programing, Systems, Languages, and Applications (OOPSLA), pages 169--190, Oct. 2006. Google ScholarDigital Library
- S. M. Blackburn, K. S. McKinley, R. Garner, C. Hoffman, A. M. Khan, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanović, T. VanDrunen, D. von Dincklage, and B. Wiedermann. Wake up and smell the coffee: Evaluation methodology for the 21st century. Communications of the ACM, 51 (8): 83--89, Aug. 2008. Google ScholarDigital Library
- T. Cao, S. M. Blackburn, T. Gao, and K. S. McKinley. The yin and yang of power and performance for asymmetric hardware and managed software. In The 39th International Symposium on Computer Architecture (ISCA), pages 225--236, June 2012. Google ScholarDigital Library
- R. H. Dennard, F. H. Gaensslen, V. L. Rideout, E. Bassous, and A. R. LeBlanc. Design of ion-implanted mosfet's with very small physical dimensions. IEEE Journal of Solid-State Circuits, Oct 1974.Google ScholarCross Ref
- J. Dorsey, S. Searles, M. Ciraula, S. Johnson, N. Bujanos, D. Wu, M. Braganza, S. Meyers, E. Fang, and R. Kumar. An integrated quad-core Opteron processor. In Proceedings of the International Solid State Circuits Conference (ISSCC), pages 102--103, Feb. 2007.Google ScholarCross Ref
- H. Esmaeilzadeh, E. R. Blem, R. S. Amant, K. Sankaralingam, and D. Burger. Dark silicon and the end of multicore scaling. In 38th International Symposium on Computer Architecture (ISCA), pages 365--376, June 2011. Google ScholarDigital Library
- H. Esmaeilzadeh, T. Cao, Y. Xi, S. M. Blackburn, and K. S. McKinley. Looking back on the language and hardware revolutions: Measured power, performance, and scaling. In Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 319--332, June 2011. Google ScholarDigital Library
- A. Georges, D. Buytaert, and L. Eeckhout. Statistically rigorous Java performance evaluation. In Proceedings of the Annual ACM SIGPLAN Conference on Object-Oriented Programming, Languages, Applications and Systems (OOPSLA), pages 57--76, Oct. 2007. Google ScholarDigital Library
- C.-H. Hsu and U. Kremer. The design, implementation, and evaluation of a compiler algorithm for CPU energy reduction. In Proceedings of the International Symposium on Programming Language Design and Implementation (PLDI), pages 38--48, June 2003. Google ScholarDigital Library
- S. Hu and L. K. John. Impact of virtual execution environments on processor energy consumption and hardware adaptation. In International Conference on Virtual Execution Environments (VEE), pages 100--110, June 2006. Google ScholarDigital Library
- C. J. Hughes, J. Srinivasan, and S. V. Adve. Saving energy with architectural and frequency adaptations for multimedia applications. In Proceedings of the 34th Annual International Symposium on Microarchitecture (MICRO), pages 250--261, Dec. 2001. Google ScholarDigital Library
- Intel Coorporation. Intel turbo boost technology in Intel core microarchitecture (Nehalem) based processors, Nov 2008.Google Scholar
- C. Isci, A. Buyuktosunoglu, C.-Y. Cher, P. Bose, and M. Martonosi. An analysis of efficient multi-core global power management policies: Maximizing performance for a given power budget. In Proceedings of the International Symposium on Microarchitecture (MICRO), pages 347--358, Dec. 2006. Google ScholarDigital Library
- C. Isci, G. Contreras, and M. Martonosi. Live, runtime phase monitoring and prediction on real systems and application to dynamic power management. In Proceedings of the International Symposium on Microarchitecture (MICRO), pages 359--370, Dec. 2006. Google ScholarDigital Library
- W. Kim, M. S. Gupta, G.-Y. Wei, and D. Brooks. System level analysis of fast, per-core DVFS using on-chip switching regulators. In Proceedings of the International Symposium on High-Performance Computer Architecture (HPCA), pages 123--134, Feb. 2008.Google Scholar
- G. E. Moore. Readings in computer architecture. chapter Cramming more components onto integrated circuits, pages 56--59. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2000. Google ScholarDigital Library
- Y. Seeley. JIRA issue LUCENE-1800: QueryParser should use reusable token streams, 2009. URL https://issues.apache.org/jira/browse/LUCENE-1800.Google Scholar
- G. Semeraro, D. H. Albonesi, S. G. Dropsho, G. Magklis, S. Dwarkadas, and M. L. Scott. Dynamic frequency and voltage control for a multiple clock domain microarchitecture. In Proceedings of the International Symposium on Microarchitecture (MICRO), pages 356--367, Nov. 2002. Google ScholarDigital Library
- TIOBE Software. TIOBE programming community index, 2011. http://tiobe.com/tpci.html.Google Scholar
- Q. Wu, V. J. Reddi, Y. Wu, J. Lee, D. Connors, D. Brooks, M. Martonosi, and D. W. Clark. A dynamic compilation framework for controlling microprocessor energy and performance. In Proceedings of the International Symposium on Microarchitecture (MICRO), pages 271--282, Nov. 2005. Google ScholarDigital Library
- F. Xie, M. Martonosi, and S. Malik. Compile-time dynamic voltage scaling settings: Opportunities and limits. In Proceedings of the International Symposium on Programming Language Design and Implementation (PLDI), pages 49--62, June 2003. Google ScholarDigital Library
- X. Yang, S. Blackburn, D. Frampton, J. Sartor, and K. McKinley. Why nothing matters: The impact of zeroing. In Proceedings of the 2011 ACM International Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA), pages 307--324, Oct 2011. Google ScholarDigital Library
Index Terms
- Exploring multi-threaded Java application performance on multicore hardware
Recommendations
Exploring multi-threaded Java application performance on multicore hardware
OOPSLA '12While there have been many studies of how to schedule applications to take advantage of increasing numbers of cores in modern-day multicore processors, few have focused on multi-threaded managed language applications which are prevalent from the ...
Efficient compilation of fine-grained SPMD-threaded programs for multicore CPUs
CGO '10: Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimizationIn this paper we describe techniques for compiling fine-grained SPMD-threaded programs, expressed in programming models such as OpenCL or CUDA, to multicore execution platforms. Programs developed for manycore processors typically express finer thread-...
Analysis of computing and energy performance of multicore, NUMA, and manycore platforms for an irregular application
IA3 '13: Proceedings of the 3rd Workshop on Irregular Applications: Architectures and AlgorithmsThe exponential growth in processor performance seems to have reached a turning point. Nowadays, energy efficiency is as important as performance and has become a critical aspect to the development of scalable systems. These strict energy constraints ...
Comments