Abstract
The current trend to move from homogeneous to heterogeneous multicore systems provides compelling opportunities for achieving performance and energy efficiency goals. Running multiple threads in multicore systems poses challenges on meeting limited shared resources, such as memory bandwidth. We propose an optimization approach that includes an Integer Linear Programming (ILP) optimization model and a scheme to dynamically determine thread-to-core assignment. We present simulation analysis that shows energy savings and performance gains for a variety of workloads compared to state-of-the-art schemes. We implemented and evaluated a prototype of our thread assignment approach at user level, leveraging Linux scheduling and performance-monitoring capabilities.
- Sanjoy Baruah. 2004. Task partitioning upon heterogeneous multiprocessor platforms. In Proceedings of the IEEE Real-Time Systems and Embedded Technology and Applications Symposium. 536--543. Google ScholarDigital Library
- Michela Becchi and Patrick Crowley. 2006. Dynamic thread assignment on heterogeneous multiprocessor architectures. In Computing Frontiers. Google ScholarDigital Library
- Sergey Blagodurov and Alexandra Fedorova. 2011. User-level scheduling on NUMA multicore systems under Linux. In Proceedings of the Linux Symposium.Google Scholar
- Sergey Blagodurov, Sergey Zhuravlev, and Alexandra Fedorova. 2010. Contention-aware scheduling on multicore systems. ACM Trans. Comput. Syst. 28, 4, 45 pages. DOI: http://dx.doi.org/10.1145/1880018.1880019 Google ScholarDigital Library
- Björn B. Brandenburg and James H. Anderson. 2009. On the implementation of global real-time schedulers. In Proceedings of the 2009 30th IEEE Real-Time Systems Symposium (RTSS’09). IEEE Computer Society, Washington, DC, 214--224. DOI: http://dx.doi.org/10.1109/RTSS.2009.23 Google ScholarDigital Library
- David M. Brooks, Pradip Bose, Stanley E. Schuster, Hans Jacobson, Prabhakar N. Kudva, Alper Buyuktosunoglu, John-David Wellman, Victor Zyuban, Manish Gupta, and Peter W. Cook. 2000. Power-aware microarchitecture: Design and modeling challenges for next-generation microprocessors. IEEE Micro 20, 6, 26--44. Google ScholarDigital Library
- Nagabhushan Chitlur, Ganapati Srinivasa, Scott Hahn, P. K. Gupta, Dheeraj Reddy, David Koufaty, Paul Brett, Abirami Prabhakaran, Li Zhao, Nelson Ijih, Suchit Subhaschandra, Sabina Grover, Xiaowei Jiang, and Ravi Iyer. 2012. QuickIA: Exploring heterogeneous architectures on real prototypes. In Proceedings of the 2012 IEEE 18th International Symposium on High Performance Computer Architecture (HPCA’12). 1--8. DOI: http://dx.doi.org/10.1109/HPCA.2012.6169046 Google ScholarDigital Library
- Electronic Educational Devices. 2010. Watts Up PRO. Retrieved October 28, 2014 from http://www.wattsupmeters.com/.Google Scholar
- Stephane Eranian. 2006. Perfmon2: A flexible performance monitoring interface for Linux. In Proceedings of the Linux Symposium. 269--287.Google Scholar
- Alexandra Fedorova, Juan Carlos Saez, Daniel Shelepov, and Manuel Prieto. 2009. Maximizing power efficiency with asymmetric multicore systems. Commun. ACM 52, 12, 48--57. Google ScholarDigital Library
- P. Greenhalgh. 2011. Big.LITTLE Processing with ARM CortexTM-A15 and Cortex-A7. White Paper.Google Scholar
- Vishakha Gupta, Rob Knauerhase, and Karsten Schwan. 2011. Attaining system performance points: Revisiting the end-to-end argument in system design for heterogeneous many-core systems. SIGOPS Oper. Syst. Rev. 45, 1, 3--10. Google ScholarDigital Library
- Gurobi Optimization Inc. 2011. Gurobi Optimizer Version 4.5. Retrieved from http://www.gurobi.com/.Google Scholar
- Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten. 2009. The WEKA data mining software: An update. SIGKDD Explor. Newsl. 11, 1, 10--18. Issue 1. DOI: http://dx.doi.org/10.1145/1656274.1656278 Google ScholarDigital Library
- Intel Corp. 2011. Intel Processor Specifications. Retrieved October 28, 2014 from http://ark.intel.com/. (2011).Google Scholar
- Aamer Jaleel. 2011. Memory characterization of workloads using instrumentation-driven simulation. http://www.jaleels.org/ajaleel/workload/.Google Scholar
- N. Karmarkar. 1984. A new polynomial-time algorithm for linear programming. In Proceedings of the 16th Annual ACM symposium on Theory of Computing (STOC’84). ACM, New York, NY, 302--311. DOI: http://dx.doi.org/10.1145/800057.808695 Google ScholarDigital Library
- Rob Knauerhase, Paul Brett, Barbara Hohlt, Tong Li, and Scott Hahn. 2008. Using OS observations to improve performance in multicore systems. IEEE Micro 28, 3, 54--66. Google ScholarDigital Library
- David Koufaty, Dheeraj Reddy, and Scott Hahn. 2010. Bias scheduling in heterogeneous multi-core architectures. In Proceedings of EuroSys. Google ScholarDigital Library
- Rakesh Kumar, Keith I. Farkas, Norman P. Jouppi, Parthasarathy Ranganathan, and Dean M. Tullsen. 2003. Single-ISA heterogeneous multi-core architectures: The potential for processor power reduction. In Proceedings of MICRO 36. Google ScholarDigital Library
- Rakesh Kumar, Dean M. Tullsen, and Norman P. Jouppi. 2006. Core architecture optimization for heterogeneous chip multiprocessors. In Proceedings of PACT. 23--32. Google ScholarDigital Library
- Tong Li, P. Brett, R. Knauerhase, D. Koufaty, D. Reddy, and S. Hahn. 2010. Operating system support for overlapping-ISA heterogeneous multi-core architectures. In Proceedings of HPCA.Google Scholar
- Jason Mars, Lingjia Tang, and Mary Lou Soffa. 2011. Directly characterizing cross core interference through contention synthesis. In Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers (HiPEAC’11). ACM, New York, NY, 167--176. DOI: http://dx.doi.org/10.1145/1944862.1944887 Google ScholarDigital Library
- Silvano Martello and Paolo Toth. 1990. Knapsack Problems: Algorithms and Computer Implementations. John Wiley & Sons, Inc., New York, NY. Google ScholarDigital Library
- Larry W. McVoy and Carl Staelin. 1996. lmbench: Portable tools for performance analysis. In Proceedings of the USENIX Annual Technical Conference (2002-01-03). 279--294. Google ScholarDigital Library
- Vinicius Petrucci, Orlando Loques, Daniel Mosse’, Rami Melhem, Neven Gazala, and Sameh Gobriel. 2012. Thread assignment optimization with real-time performance and memory bandwidth guarantees for energy-efficient heterogeneous multi-core systems. In Proceedings of the 18th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS’12). Google ScholarDigital Library
- Nikola Rajovic, Alejandro Rico, James Vipond, Isaac Gelado, Nikola Puzovic, and Alex Ramirez. 2013. Experiences with mobile processors for energy efficient HPC. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE’13). San Jose, CA, 464--468. Google ScholarDigital Library
- Krishna K. Rangan, Gu-Yeon Wei, and David Brooks. 2009. Thread motion: Fine-grained power management for multi-core systems. In Proceedings of ISCA. 302--313. Google ScholarDigital Library
- Juan Carlos Saez, Manuel Prieto, Alexandra Fedorova, and Sergey Blagodurov. 2010. A comprehensive scheduler for asymmetric multicore systems. In Proceedings of EuroSys. 139--152. Google ScholarDigital Library
- Samsung Electronics. 2013. SAMSUNG highlightsinnovations in mobile experiences driven by components, in CES keynote. Retrieved October 28, 2014 from http://www.samsung.com/us/news/20353.Google Scholar
- Daniel Shelepov, Juan Carlos Saez Alcaide, Stacey Jeffery, Alexandra Fedorova, Nestor Perez, Zhi Feng Huang, Sergey Blagodurov, and Viren Kumar. 2009. HASS: A scheduler for heterogeneous multicore systems. SIGOPS Oper. Syst. Rev. 43, 2, 66--75. Google ScholarDigital Library
- P. B. Sousa, B. Andersson, and E. Tovar. 2011. Implementing slot-based task-splitting multiprocessor scheduling. In Proceedings of the IEEE International Symposium on Industrial Embedded Systems (SIES’11). 256--265. DOI: http://dx.doi.org/10.1109/SIES.2011.5953669Google Scholar
- Sadagopan Srinivasan, Li Zhao, Ramesh Illikkal, and Ravishankar Iyer. 2011. Efficient interaction between OS and architecture in heterogeneous platforms. SIGOPS Oper. Syst. Rev. 45, 1 62--72. Google ScholarDigital Library
Index Terms
- Energy-Efficient Thread Assignment Optimization for Heterogeneous Multicore Systems
Recommendations
Thread Assignment Optimization with Real-Time Performance and Memory Bandwidth Guarantees for Energy-Efficient Heterogeneous Multi-core Systems
RTAS '12: Proceedings of the 2012 IEEE 18th Real Time and Embedded Technology and Applications SymposiumThe current trend to move from homogeneous to heterogeneous multi-core systems promises further performance and energy-efficiency benefits. A typical future heterogeneous multi-core system includes two distinct types of cores, such as high performance ...
Dynamic Partitioning-based JPEG Decompression on Heterogeneous Multicore Architectures
PMAM'14: Proceedings of Programming Models and Applications on Multicores and ManycoresWith the emergence of social networks and improvements in computational photography, billions of JPEG images are shared and viewed on a daily basis. Desktops, tablets and smartphones constitute the vast majority of hardware platforms used for displaying ...
Online Thread Assignment for Heterogeneous Multicore Systems
ICPPW '12: Proceedings of the 2012 41st International Conference on Parallel Processing WorkshopsAs computing devices absorb more of our computing needs and the energy crisis continues, specialized hardware is being built with energy conservation in mind. In particular, processor manufacturers keep increasing the number of cores and are now moving ...
Comments