research-article

Energy-Efficient Thread Assignment Optimization for Heterogeneous Multicore Systems

Authors:
Vinicius Petrucci

University of Michigan

University of Michigan
View Profile

,
Orlando Loques

Universidade Federal Fluminense, Brazil

Universidade Federal Fluminense, Brazil
View Profile

,
Daniel Mossé

University of Pittsburgh, USA

University of Pittsburgh, USA
View Profile

,
Rami Melhem

University of Pittsburgh, USA

University of Pittsburgh, USA
View Profile

,
Neven Abou Gazala

Intel Corporation, USA

Intel Corporation, USA
View Profile

,
Sameh Gobriel

Intel Corporation

Intel Corporation
View Profile

Authors Info & Claims

ACM Transactions on Embedded Computing Systems Volume 14 Issue 1Article No.: 15pp 1–26https://doi.org/10.1145/2566618

Published:21 January 2015Publication History

ACM Transactions on Embedded Computing Systems

Abstract

The current trend to move from homogeneous to heterogeneous multicore systems provides compelling opportunities for achieving performance and energy efficiency goals. Running multiple threads in multicore systems poses challenges on meeting limited shared resources, such as memory bandwidth. We propose an optimization approach that includes an Integer Linear Programming (ILP) optimization model and a scheme to dynamically determine thread-to-core assignment. We present simulation analysis that shows energy savings and performance gains for a variety of workloads compared to state-of-the-art schemes. We implemented and evaluated a prototype of our thread assignment approach at user level, leveraging Linux scheduling and performance-monitoring capabilities.

References

Sanjoy Baruah. 2004. Task partitioning upon heterogeneous multiprocessor platforms. In Proceedings of the IEEE Real-Time Systems and Embedded Technology and Applications Symposium. 536--543. Google ScholarDigital Library
Michela Becchi and Patrick Crowley. 2006. Dynamic thread assignment on heterogeneous multiprocessor architectures. In Computing Frontiers. Google ScholarDigital Library
Sergey Blagodurov and Alexandra Fedorova. 2011. User-level scheduling on NUMA multicore systems under Linux. In Proceedings of the Linux Symposium.Google Scholar
Sergey Blagodurov, Sergey Zhuravlev, and Alexandra Fedorova. 2010. Contention-aware scheduling on multicore systems. ACM Trans. Comput. Syst. 28, 4, 45 pages. DOI: http://dx.doi.org/10.1145/1880018.1880019 Google ScholarDigital Library
Björn B. Brandenburg and James H. Anderson. 2009. On the implementation of global real-time schedulers. In Proceedings of the 2009 30th IEEE Real-Time Systems Symposium (RTSS’09). IEEE Computer Society, Washington, DC, 214--224. DOI: http://dx.doi.org/10.1109/RTSS.2009.23 Google ScholarDigital Library
David M. Brooks, Pradip Bose, Stanley E. Schuster, Hans Jacobson, Prabhakar N. Kudva, Alper Buyuktosunoglu, John-David Wellman, Victor Zyuban, Manish Gupta, and Peter W. Cook. 2000. Power-aware microarchitecture: Design and modeling challenges for next-generation microprocessors. IEEE Micro 20, 6, 26--44. Google ScholarDigital Library
Nagabhushan Chitlur, Ganapati Srinivasa, Scott Hahn, P. K. Gupta, Dheeraj Reddy, David Koufaty, Paul Brett, Abirami Prabhakaran, Li Zhao, Nelson Ijih, Suchit Subhaschandra, Sabina Grover, Xiaowei Jiang, and Ravi Iyer. 2012. QuickIA: Exploring heterogeneous architectures on real prototypes. In Proceedings of the 2012 IEEE 18th International Symposium on High Performance Computer Architecture (HPCA’12). 1--8. DOI: http://dx.doi.org/10.1109/HPCA.2012.6169046 Google ScholarDigital Library
Electronic Educational Devices. 2010. Watts Up PRO. Retrieved October 28, 2014 from http://www.wattsupmeters.com/.Google Scholar
Stephane Eranian. 2006. Perfmon2: A flexible performance monitoring interface for Linux. In Proceedings of the Linux Symposium. 269--287.Google Scholar
Alexandra Fedorova, Juan Carlos Saez, Daniel Shelepov, and Manuel Prieto. 2009. Maximizing power efficiency with asymmetric multicore systems. Commun. ACM 52, 12, 48--57. Google ScholarDigital Library
P. Greenhalgh. 2011. Big.LITTLE Processing with ARM CortexTM-A15 and Cortex-A7. White Paper.Google Scholar
Vishakha Gupta, Rob Knauerhase, and Karsten Schwan. 2011. Attaining system performance points: Revisiting the end-to-end argument in system design for heterogeneous many-core systems. SIGOPS Oper. Syst. Rev. 45, 1, 3--10. Google ScholarDigital Library
Gurobi Optimization Inc. 2011. Gurobi Optimizer Version 4.5. Retrieved from http://www.gurobi.com/.Google Scholar
Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten. 2009. The WEKA data mining software: An update. SIGKDD Explor. Newsl. 11, 1, 10--18. Issue 1. DOI: http://dx.doi.org/10.1145/1656274.1656278 Google ScholarDigital Library
Intel Corp. 2011. Intel Processor Specifications. Retrieved October 28, 2014 from http://ark.intel.com/. (2011).Google Scholar
Aamer Jaleel. 2011. Memory characterization of workloads using instrumentation-driven simulation. http://www.jaleels.org/ajaleel/workload/.Google Scholar
N. Karmarkar. 1984. A new polynomial-time algorithm for linear programming. In Proceedings of the 16th Annual ACM symposium on Theory of Computing (STOC’84). ACM, New York, NY, 302--311. DOI: http://dx.doi.org/10.1145/800057.808695 Google ScholarDigital Library
Rob Knauerhase, Paul Brett, Barbara Hohlt, Tong Li, and Scott Hahn. 2008. Using OS observations to improve performance in multicore systems. IEEE Micro 28, 3, 54--66. Google ScholarDigital Library
David Koufaty, Dheeraj Reddy, and Scott Hahn. 2010. Bias scheduling in heterogeneous multi-core architectures. In Proceedings of EuroSys. Google ScholarDigital Library
Rakesh Kumar, Keith I. Farkas, Norman P. Jouppi, Parthasarathy Ranganathan, and Dean M. Tullsen. 2003. Single-ISA heterogeneous multi-core architectures: The potential for processor power reduction. In Proceedings of MICRO 36. Google ScholarDigital Library
Rakesh Kumar, Dean M. Tullsen, and Norman P. Jouppi. 2006. Core architecture optimization for heterogeneous chip multiprocessors. In Proceedings of PACT. 23--32. Google ScholarDigital Library
Tong Li, P. Brett, R. Knauerhase, D. Koufaty, D. Reddy, and S. Hahn. 2010. Operating system support for overlapping-ISA heterogeneous multi-core architectures. In Proceedings of HPCA.Google Scholar
Jason Mars, Lingjia Tang, and Mary Lou Soffa. 2011. Directly characterizing cross core interference through contention synthesis. In Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers (HiPEAC’11). ACM, New York, NY, 167--176. DOI: http://dx.doi.org/10.1145/1944862.1944887 Google ScholarDigital Library
Silvano Martello and Paolo Toth. 1990. Knapsack Problems: Algorithms and Computer Implementations. John Wiley & Sons, Inc., New York, NY. Google ScholarDigital Library
Larry W. McVoy and Carl Staelin. 1996. lmbench: Portable tools for performance analysis. In Proceedings of the USENIX Annual Technical Conference (2002-01-03). 279--294. Google ScholarDigital Library
Vinicius Petrucci, Orlando Loques, Daniel Mosse’, Rami Melhem, Neven Gazala, and Sameh Gobriel. 2012. Thread assignment optimization with real-time performance and memory bandwidth guarantees for energy-efficient heterogeneous multi-core systems. In Proceedings of the 18th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS’12). Google ScholarDigital Library
Nikola Rajovic, Alejandro Rico, James Vipond, Isaac Gelado, Nikola Puzovic, and Alex Ramirez. 2013. Experiences with mobile processors for energy efficient HPC. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE’13). San Jose, CA, 464--468. Google ScholarDigital Library
Krishna K. Rangan, Gu-Yeon Wei, and David Brooks. 2009. Thread motion: Fine-grained power management for multi-core systems. In Proceedings of ISCA. 302--313. Google ScholarDigital Library
Juan Carlos Saez, Manuel Prieto, Alexandra Fedorova, and Sergey Blagodurov. 2010. A comprehensive scheduler for asymmetric multicore systems. In Proceedings of EuroSys. 139--152. Google ScholarDigital Library
Samsung Electronics. 2013. SAMSUNG highlightsinnovations in mobile experiences driven by components, in CES keynote. Retrieved October 28, 2014 from http://www.samsung.com/us/news/20353.Google Scholar
Daniel Shelepov, Juan Carlos Saez Alcaide, Stacey Jeffery, Alexandra Fedorova, Nestor Perez, Zhi Feng Huang, Sergey Blagodurov, and Viren Kumar. 2009. HASS: A scheduler for heterogeneous multicore systems. SIGOPS Oper. Syst. Rev. 43, 2, 66--75. Google ScholarDigital Library
P. B. Sousa, B. Andersson, and E. Tovar. 2011. Implementing slot-based task-splitting multiprocessor scheduling. In Proceedings of the IEEE International Symposium on Industrial Embedded Systems (SIES’11). 256--265. DOI: http://dx.doi.org/10.1109/SIES.2011.5953669Google Scholar
Sadagopan Srinivasan, Li Zhao, Ramesh Illikkal, and Ravishankar Iyer. 2011. Efficient interaction between OS and architecture in heterogeneous platforms. SIGOPS Oper. Syst. Rev. 45, 1 62--72. Google ScholarDigital Library

Index Terms

Energy-Efficient Thread Assignment Optimization for Heterogeneous Multicore Systems

Recommendations

Thread Assignment Optimization with Real-Time Performance and Memory Bandwidth Guarantees for Energy-Efficient Heterogeneous Multi-core Systems
RTAS '12: Proceedings of the 2012 IEEE 18th Real Time and Embedded Technology and Applications Symposium

The current trend to move from homogeneous to heterogeneous multi-core systems promises further performance and energy-efficiency benefits. A typical future heterogeneous multi-core system includes two distinct types of cores, such as high performance ...
Read More
Dynamic Partitioning-based JPEG Decompression on Heterogeneous Multicore Architectures
PMAM'14: Proceedings of Programming Models and Applications on Multicores and Manycores

With the emergence of social networks and improvements in computational photography, billions of JPEG images are shared and viewed on a daily basis. Desktops, tablets and smartphones constitute the vast majority of hardware platforms used for displaying ...
Read More
Online Thread Assignment for Heterogeneous Multicore Systems
ICPPW '12: Proceedings of the 2012 41st International Conference on Parallel Processing Workshops

As computing devices absorb more of our computing needs and the energy crisis continues, specialized hardware is being built with energy conservation in mind. In particular, processor manufacturers keep increasing the number of cores and are now moving ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Embedded Computing Systems Volume 14, Issue 1
January 2015
443 pages
ISSN:1539-9087
EISSN:1558-3465
DOI:10.1145/2724585
Editor:
Sandeep K. Shukla
Virginia Tech, USA
Issue’s Table of Contents
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States

Journal Family
ACM Journals for the Design of Smart and Connected Systems
Publication History
- Published: 21 January 2015
- Revised: 1 December 2013
- Accepted: 1 December 2013
- Received: 1 July 2012
Published in tecs Volume 14, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Task scheduling
energy efficiency
heterogeneous multicores
memory bandwidth
optimization
real-time performance
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 27
  Total Citations
  View Citations
- 633
  Total Downloads
- Downloads (Last 12 months)21
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Energy-Efficient Thread Assignment Optimization for Heterogeneous Multicore Systems

ACM Transactions on Embedded Computing Systems

Abstract

References

Cited By

Index Terms

Recommendations

Thread Assignment Optimization with Real-Time Performance and Memory Bandwidth Guarantees for Energy-Efficient Heterogeneous Multi-core Systems

Dynamic Partitioning-based JPEG Decompression on Heterogeneous Multicore Architectures

Online Thread Assignment for Heterogeneous Multicore Systems