skip to main content
research-article

Energy-Efficient Thread Assignment Optimization for Heterogeneous Multicore Systems

Published:21 January 2015Publication History
Skip Abstract Section

Abstract

The current trend to move from homogeneous to heterogeneous multicore systems provides compelling opportunities for achieving performance and energy efficiency goals. Running multiple threads in multicore systems poses challenges on meeting limited shared resources, such as memory bandwidth. We propose an optimization approach that includes an Integer Linear Programming (ILP) optimization model and a scheme to dynamically determine thread-to-core assignment. We present simulation analysis that shows energy savings and performance gains for a variety of workloads compared to state-of-the-art schemes. We implemented and evaluated a prototype of our thread assignment approach at user level, leveraging Linux scheduling and performance-monitoring capabilities.

References

  1. Sanjoy Baruah. 2004. Task partitioning upon heterogeneous multiprocessor platforms. In Proceedings of the IEEE Real-Time Systems and Embedded Technology and Applications Symposium. 536--543. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Michela Becchi and Patrick Crowley. 2006. Dynamic thread assignment on heterogeneous multiprocessor architectures. In Computing Frontiers. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Sergey Blagodurov and Alexandra Fedorova. 2011. User-level scheduling on NUMA multicore systems under Linux. In Proceedings of the Linux Symposium.Google ScholarGoogle Scholar
  4. Sergey Blagodurov, Sergey Zhuravlev, and Alexandra Fedorova. 2010. Contention-aware scheduling on multicore systems. ACM Trans. Comput. Syst. 28, 4, 45 pages. DOI: http://dx.doi.org/10.1145/1880018.1880019 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Björn B. Brandenburg and James H. Anderson. 2009. On the implementation of global real-time schedulers. In Proceedings of the 2009 30th IEEE Real-Time Systems Symposium (RTSS’09). IEEE Computer Society, Washington, DC, 214--224. DOI: http://dx.doi.org/10.1109/RTSS.2009.23 Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. David M. Brooks, Pradip Bose, Stanley E. Schuster, Hans Jacobson, Prabhakar N. Kudva, Alper Buyuktosunoglu, John-David Wellman, Victor Zyuban, Manish Gupta, and Peter W. Cook. 2000. Power-aware microarchitecture: Design and modeling challenges for next-generation microprocessors. IEEE Micro 20, 6, 26--44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Nagabhushan Chitlur, Ganapati Srinivasa, Scott Hahn, P. K. Gupta, Dheeraj Reddy, David Koufaty, Paul Brett, Abirami Prabhakaran, Li Zhao, Nelson Ijih, Suchit Subhaschandra, Sabina Grover, Xiaowei Jiang, and Ravi Iyer. 2012. QuickIA: Exploring heterogeneous architectures on real prototypes. In Proceedings of the 2012 IEEE 18th International Symposium on High Performance Computer Architecture (HPCA’12). 1--8. DOI: http://dx.doi.org/10.1109/HPCA.2012.6169046 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Electronic Educational Devices. 2010. Watts Up PRO. Retrieved October 28, 2014 from http://www.wattsupmeters.com/.Google ScholarGoogle Scholar
  9. Stephane Eranian. 2006. Perfmon2: A flexible performance monitoring interface for Linux. In Proceedings of the Linux Symposium. 269--287.Google ScholarGoogle Scholar
  10. Alexandra Fedorova, Juan Carlos Saez, Daniel Shelepov, and Manuel Prieto. 2009. Maximizing power efficiency with asymmetric multicore systems. Commun. ACM 52, 12, 48--57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. P. Greenhalgh. 2011. Big.LITTLE Processing with ARM CortexTM-A15 and Cortex-A7. White Paper.Google ScholarGoogle Scholar
  12. Vishakha Gupta, Rob Knauerhase, and Karsten Schwan. 2011. Attaining system performance points: Revisiting the end-to-end argument in system design for heterogeneous many-core systems. SIGOPS Oper. Syst. Rev. 45, 1, 3--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Gurobi Optimization Inc. 2011. Gurobi Optimizer Version 4.5. Retrieved from http://www.gurobi.com/.Google ScholarGoogle Scholar
  14. Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten. 2009. The WEKA data mining software: An update. SIGKDD Explor. Newsl. 11, 1, 10--18. Issue 1. DOI: http://dx.doi.org/10.1145/1656274.1656278 Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Intel Corp. 2011. Intel Processor Specifications. Retrieved October 28, 2014 from http://ark.intel.com/. (2011).Google ScholarGoogle Scholar
  16. Aamer Jaleel. 2011. Memory characterization of workloads using instrumentation-driven simulation. http://www.jaleels.org/ajaleel/workload/.Google ScholarGoogle Scholar
  17. N. Karmarkar. 1984. A new polynomial-time algorithm for linear programming. In Proceedings of the 16th Annual ACM symposium on Theory of Computing (STOC’84). ACM, New York, NY, 302--311. DOI: http://dx.doi.org/10.1145/800057.808695 Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Rob Knauerhase, Paul Brett, Barbara Hohlt, Tong Li, and Scott Hahn. 2008. Using OS observations to improve performance in multicore systems. IEEE Micro 28, 3, 54--66. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. David Koufaty, Dheeraj Reddy, and Scott Hahn. 2010. Bias scheduling in heterogeneous multi-core architectures. In Proceedings of EuroSys. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Rakesh Kumar, Keith I. Farkas, Norman P. Jouppi, Parthasarathy Ranganathan, and Dean M. Tullsen. 2003. Single-ISA heterogeneous multi-core architectures: The potential for processor power reduction. In Proceedings of MICRO 36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Rakesh Kumar, Dean M. Tullsen, and Norman P. Jouppi. 2006. Core architecture optimization for heterogeneous chip multiprocessors. In Proceedings of PACT. 23--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Tong Li, P. Brett, R. Knauerhase, D. Koufaty, D. Reddy, and S. Hahn. 2010. Operating system support for overlapping-ISA heterogeneous multi-core architectures. In Proceedings of HPCA.Google ScholarGoogle Scholar
  23. Jason Mars, Lingjia Tang, and Mary Lou Soffa. 2011. Directly characterizing cross core interference through contention synthesis. In Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers (HiPEAC’11). ACM, New York, NY, 167--176. DOI: http://dx.doi.org/10.1145/1944862.1944887 Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Silvano Martello and Paolo Toth. 1990. Knapsack Problems: Algorithms and Computer Implementations. John Wiley & Sons, Inc., New York, NY. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Larry W. McVoy and Carl Staelin. 1996. lmbench: Portable tools for performance analysis. In Proceedings of the USENIX Annual Technical Conference (2002-01-03). 279--294. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Vinicius Petrucci, Orlando Loques, Daniel Mosse’, Rami Melhem, Neven Gazala, and Sameh Gobriel. 2012. Thread assignment optimization with real-time performance and memory bandwidth guarantees for energy-efficient heterogeneous multi-core systems. In Proceedings of the 18th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS’12). Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Nikola Rajovic, Alejandro Rico, James Vipond, Isaac Gelado, Nikola Puzovic, and Alex Ramirez. 2013. Experiences with mobile processors for energy efficient HPC. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE’13). San Jose, CA, 464--468. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Krishna K. Rangan, Gu-Yeon Wei, and David Brooks. 2009. Thread motion: Fine-grained power management for multi-core systems. In Proceedings of ISCA. 302--313. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Juan Carlos Saez, Manuel Prieto, Alexandra Fedorova, and Sergey Blagodurov. 2010. A comprehensive scheduler for asymmetric multicore systems. In Proceedings of EuroSys. 139--152. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Samsung Electronics. 2013. SAMSUNG highlightsinnovations in mobile experiences driven by components, in CES keynote. Retrieved October 28, 2014 from http://www.samsung.com/us/news/20353.Google ScholarGoogle Scholar
  31. Daniel Shelepov, Juan Carlos Saez Alcaide, Stacey Jeffery, Alexandra Fedorova, Nestor Perez, Zhi Feng Huang, Sergey Blagodurov, and Viren Kumar. 2009. HASS: A scheduler for heterogeneous multicore systems. SIGOPS Oper. Syst. Rev. 43, 2, 66--75. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. P. B. Sousa, B. Andersson, and E. Tovar. 2011. Implementing slot-based task-splitting multiprocessor scheduling. In Proceedings of the IEEE International Symposium on Industrial Embedded Systems (SIES’11). 256--265. DOI: http://dx.doi.org/10.1109/SIES.2011.5953669Google ScholarGoogle Scholar
  33. Sadagopan Srinivasan, Li Zhao, Ramesh Illikkal, and Ravishankar Iyer. 2011. Efficient interaction between OS and architecture in heterogeneous platforms. SIGOPS Oper. Syst. Rev. 45, 1 62--72. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Energy-Efficient Thread Assignment Optimization for Heterogeneous Multicore Systems

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image ACM Transactions on Embedded Computing Systems
              ACM Transactions on Embedded Computing Systems  Volume 14, Issue 1
              January 2015
              443 pages
              ISSN:1539-9087
              EISSN:1558-3465
              DOI:10.1145/2724585
              Issue’s Table of Contents

              Copyright © 2015 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 21 January 2015
              • Revised: 1 December 2013
              • Accepted: 1 December 2013
              • Received: 1 July 2012
              Published in tecs Volume 14, Issue 1

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article
              • Research
              • Refereed

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader