skip to main content
10.1145/2384616.2384638acmconferencesArticle/Chapter ViewAbstractPublication PagessplashConference Proceedingsconference-collections
research-article

Exploring multi-threaded Java application performance on multicore hardware

Published:19 October 2012Publication History

ABSTRACT

While there have been many studies of how to schedule applications to take advantage of increasing numbers of cores in modern-day multicore processors, few have focused on multi-threaded managed language applications which are prevalent from the embedded to the server domain. Managed languages complicate performance studies because they have additional virtual machine threads that collect garbage and dynamically compile, closely interacting with application threads. Further complexity is introduced as modern multicore machines have multiple sockets and dynamic frequency scaling options, broadening opportunities to reduce both power and running time.

In this paper, we explore the performance of Java applications, studying how best to map application and virtual machine (JVM) threads to a multicore, multi-socket environment. We explore both the cost of separating JVM threads from application threads, and the opportunity to speed up or slow down the clock frequency of isolated threads. We perform experiments with the multi-threaded DaCapo benchmarks and pseudojbb2005 running on the Jikes Research Virtual Machine, on a dual-socket, 8-core Intel Nehalem machine to reveal several novel, and sometimes counter-intuitive, findings. We believe these insights are a first but important step towards understanding and optimizing managed language performance on modern hardware.

References

  1. L. A. Barroso and U. Hölzle. The case for energy-proportional systems. IEEE Computer, 40: 33--37, Dec. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. M. Blackburn and K. S. McKinley. Immix: A mark-region garbage collector with space efficiency, fast collection, and mutator locality. In Programming Language Design and Implementation (PLDI), pages 22--32, Tuscon, AZ, June 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. M. Blackburn, M. Hirzel, R. Garner, and D. Stefanović. pjbb2005: The pseudojbb benchmark. URL http://users.cecs.anu.edu.au/ steveb/research/research-infrastructure/pjbb2005.Google ScholarGoogle Scholar
  4. S. M. Blackburn, R. Garner, C. Hoffman, A. M. Khan, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanović, T. VanDrunen, D. von Dincklage, and B. Wiedermann. The DaCapo benchmarks: Java benchmarking development and analysis. In ACM SIGPLAN Conference on Object-Oriented Programing, Systems, Languages, and Applications (OOPSLA), pages 169--190, Oct. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. M. Blackburn, K. S. McKinley, R. Garner, C. Hoffman, A. M. Khan, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanović, T. VanDrunen, D. von Dincklage, and B. Wiedermann. Wake up and smell the coffee: Evaluation methodology for the 21st century. Communications of the ACM, 51 (8): 83--89, Aug. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. T. Cao, S. M. Blackburn, T. Gao, and K. S. McKinley. The yin and yang of power and performance for asymmetric hardware and managed software. In The 39th International Symposium on Computer Architecture (ISCA), pages 225--236, June 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. R. H. Dennard, F. H. Gaensslen, V. L. Rideout, E. Bassous, and A. R. LeBlanc. Design of ion-implanted mosfet's with very small physical dimensions. IEEE Journal of Solid-State Circuits, Oct 1974.Google ScholarGoogle ScholarCross RefCross Ref
  8. J. Dorsey, S. Searles, M. Ciraula, S. Johnson, N. Bujanos, D. Wu, M. Braganza, S. Meyers, E. Fang, and R. Kumar. An integrated quad-core Opteron processor. In Proceedings of the International Solid State Circuits Conference (ISSCC), pages 102--103, Feb. 2007.Google ScholarGoogle ScholarCross RefCross Ref
  9. H. Esmaeilzadeh, E. R. Blem, R. S. Amant, K. Sankaralingam, and D. Burger. Dark silicon and the end of multicore scaling. In 38th International Symposium on Computer Architecture (ISCA), pages 365--376, June 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. H. Esmaeilzadeh, T. Cao, Y. Xi, S. M. Blackburn, and K. S. McKinley. Looking back on the language and hardware revolutions: Measured power, performance, and scaling. In Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 319--332, June 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Georges, D. Buytaert, and L. Eeckhout. Statistically rigorous Java performance evaluation. In Proceedings of the Annual ACM SIGPLAN Conference on Object-Oriented Programming, Languages, Applications and Systems (OOPSLA), pages 57--76, Oct. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. C.-H. Hsu and U. Kremer. The design, implementation, and evaluation of a compiler algorithm for CPU energy reduction. In Proceedings of the International Symposium on Programming Language Design and Implementation (PLDI), pages 38--48, June 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. Hu and L. K. John. Impact of virtual execution environments on processor energy consumption and hardware adaptation. In International Conference on Virtual Execution Environments (VEE), pages 100--110, June 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. C. J. Hughes, J. Srinivasan, and S. V. Adve. Saving energy with architectural and frequency adaptations for multimedia applications. In Proceedings of the 34th Annual International Symposium on Microarchitecture (MICRO), pages 250--261, Dec. 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Intel Coorporation. Intel turbo boost technology in Intel core microarchitecture (Nehalem) based processors, Nov 2008.Google ScholarGoogle Scholar
  16. C. Isci, A. Buyuktosunoglu, C.-Y. Cher, P. Bose, and M. Martonosi. An analysis of efficient multi-core global power management policies: Maximizing performance for a given power budget. In Proceedings of the International Symposium on Microarchitecture (MICRO), pages 347--358, Dec. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. C. Isci, G. Contreras, and M. Martonosi. Live, runtime phase monitoring and prediction on real systems and application to dynamic power management. In Proceedings of the International Symposium on Microarchitecture (MICRO), pages 359--370, Dec. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. W. Kim, M. S. Gupta, G.-Y. Wei, and D. Brooks. System level analysis of fast, per-core DVFS using on-chip switching regulators. In Proceedings of the International Symposium on High-Performance Computer Architecture (HPCA), pages 123--134, Feb. 2008.Google ScholarGoogle Scholar
  19. G. E. Moore. Readings in computer architecture. chapter Cramming more components onto integrated circuits, pages 56--59. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Y. Seeley. JIRA issue LUCENE-1800: QueryParser should use reusable token streams, 2009. URL https://issues.apache.org/jira/browse/LUCENE-1800.Google ScholarGoogle Scholar
  21. G. Semeraro, D. H. Albonesi, S. G. Dropsho, G. Magklis, S. Dwarkadas, and M. L. Scott. Dynamic frequency and voltage control for a multiple clock domain microarchitecture. In Proceedings of the International Symposium on Microarchitecture (MICRO), pages 356--367, Nov. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. TIOBE Software. TIOBE programming community index, 2011. http://tiobe.com/tpci.html.Google ScholarGoogle Scholar
  23. Q. Wu, V. J. Reddi, Y. Wu, J. Lee, D. Connors, D. Brooks, M. Martonosi, and D. W. Clark. A dynamic compilation framework for controlling microprocessor energy and performance. In Proceedings of the International Symposium on Microarchitecture (MICRO), pages 271--282, Nov. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. F. Xie, M. Martonosi, and S. Malik. Compile-time dynamic voltage scaling settings: Opportunities and limits. In Proceedings of the International Symposium on Programming Language Design and Implementation (PLDI), pages 49--62, June 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. X. Yang, S. Blackburn, D. Frampton, J. Sartor, and K. McKinley. Why nothing matters: The impact of zeroing. In Proceedings of the 2011 ACM International Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA), pages 307--324, Oct 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Exploring multi-threaded Java application performance on multicore hardware

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      OOPSLA '12: Proceedings of the ACM international conference on Object oriented programming systems languages and applications
      October 2012
      1052 pages
      ISBN:9781450315616
      DOI:10.1145/2384616
      • cover image ACM SIGPLAN Notices
        ACM SIGPLAN Notices  Volume 47, Issue 10
        OOPSLA '12
        October 2012
        1011 pages
        ISSN:0362-1340
        EISSN:1558-1160
        DOI:10.1145/2398857
        Issue’s Table of Contents

      Copyright © 2012 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 19 October 2012

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate268of1,244submissions,22%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader