ABSTRACT
The trend towards many-core systems comes with various issues, among them their highly dynamic and non-predictable workloads. Hence, new paradigms for managing resources of many-core systems are of paramount importance. The problem of resource management, e.g. mapping applications to processor cores, is NP-hard though, requiring heuristics especially when performed online. In this paper, we therefore present a novel resource-management scheme that supports so-called malleable applications. These applications can adopt their level of parallelism to the assigned resources. By design, our (decentralized) scheme is scalable and it copes with the computational complexity by focusing on local decision-making. Our simulations show that the quality of the mapping decisions of our approach is able to stay near the mapping quality of state-of-the-art (i.e. centralized) online schemes for malleable applications but at a reduced overall communication overhead (only about 12,75% on a 1024 core system with a total workload of 32 multi-threaded applications). In addition, our approach is scalable as opposed to a centralized scheme and therefore it is practically useful for employment in large many-core systems as our extensive studies and experiments show.
- J. Howard, S. Dighe, Y. Hoskote et al., "A 48-Core IA-32 message-passing processor with DVFS in 45nm CMOS", in IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), February 2010, pp. 108--109.Google Scholar
- Tilera Corporation, "Tile-GX Processor Family", 2011.Google Scholar
- L. Benini and G. De Micheli, "Networks on chips: a new SoC paradigm", IEEE Computer, vol. 35, no. 1, pp. 70--78, January 2002. Google ScholarDigital Library
- S. Borkar, "Thousand core chips: a technology perspective", in Proceedings of the 44th annual Design Automation Conference (DAC), 2007, pp. 746--749. Google ScholarDigital Library
- C. Marcon, E. Moreno, N. Calazans, and F. Moraes, "Evaluation of algorithms for low energy mapping onto NoCs", in IEEE International Symposium on Circuits and Systems (IS-CAS), May 2007, pp. 389--392.Google ScholarCross Ref
- J. Turek, J. L. Wolf, and P. S. Yu, "Approximate algorithms scheduling parallelizable tasks", in Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures (SPAA), 1992, pp. 323--332. Google ScholarDigital Library
- D. Feitelson, L. Rudolph, U. Schwiegelshohn et al., "Theory and practice in parallel job scheduling", in Job Scheduling Strategies for Parallel Processing, 1997, vol. 1291, pp. 1--34. Google ScholarDigital Library
- R. Blumofe and C. Leiserson, "Scheduling multithreaded computations by work stealing", Foundations of Computer Science, Annual IEEE Symposium on, vol. 0, pp. 356--368, 1994. Google ScholarDigital Library
- J. Reinders, Intel threading building blocks, 1st ed. Sebastopol, CA, USA: O'Reilly & Associates, Inc., 2007. Google ScholarDigital Library
- P. Horn, "Autonomic Computing: IBM's Perspective on the State of Information Technology", Computing Systems, pp. 1--40, 2001.Google Scholar
- G. Weiss, Ed., Multiagent Systems: A Modern Approach to Distributed Artificial Intelligence. MIT Press, 1999. Google ScholarDigital Library
- N. R. Jennings, K. Sycara, and M. Wooldridge, "A Roadmap of Agent Research and Development", Autonomous Agents and Multi-Agent Systems, vol. 1, pp. 7--38, 1998. Google ScholarDigital Library
- J. Boillat and P. Kropf, "A fast distributed mapping algorithm", in CONPAR 90 - VAPP IV, 1990, vol. 457, pp. 405--416. Google ScholarDigital Library
- B. Yang, L. Guang, T. Xu et al., "Multi-application multi-step mapping method for many-core network-on-chips", in NORCHIP, November 2010, pp. 1--6.Google Scholar
- G. Sabin, M. Lang, and P. Sadayappan, "Moldable parallel job scheduling using job efficiency: An iterative approach", in Job Scheduling Strategies for Parallel Processing, 2007, vol. 4376, pp. 94--114. Google ScholarDigital Library
- M. A. Al Faruque, R. Krist, and J. Henkel, "ADAM: run-time agent-based distributed application mapping for on-chip communication", in Proceedings of the 45th annual Design Automation Conference (DAC), 2008, pp. 760--765. Google ScholarDigital Library
- F. Berman, G. Fox, and T. Hey, Grid Computing - Making the Global Infrastructure a Reality. Chichester, UK: John Wiley & Sons, Ltd., 2003. Google ScholarDigital Library
- J. Cao, S. A. Jarvis, S. Saini et al., "Arms: An agent-based resource management system for grid computing", Sci. Pro-gram., vol. 10, pp. 135--148, April 2002. Google ScholarDigital Library
- F. Berman, R. Wolski, H. Casanova et al., "Adaptive computing on the Grid using AppLeS", IEEE Transactions on Parallel and Distributed Systems, vol. 14, no. 4, pp. 369--382, April 2003. Google ScholarDigital Library
- A. B. Downey, "A Model for Speedup of Parallel Programs", Tech. Rep., 1997. Google ScholarDigital Library
- D. Feitelson, "Metric and workload effects on computer sys-tems evaluation", Computer, vol. 36, no. 9, pp. 18--25, sept. 2003. Google ScholarDigital Library
- B. Zhou, D. Walsh, and R. Brent, "Resource allocation schemes for gang scheduling", in Job Scheduling Strategies for Parallel Processing, ser. Lecture Notes in Computer Science, D. Feitelson and L. Rudolph, Eds. Springer Berlin / Heidelberg, 2000, vol. 1911, pp. 74--86, 10.1007/3-540-39997-6_6. {Online}. Available: http://dx.doi.org/10.1007/3-540-39997-6_6 Google ScholarDigital Library
- W. Cirne and F. Berman, "Adaptive selection of partition size for supercomputer requests", in Job Scheduling Strategies for Parallel Processing, ser. Lecture Notes in Computer Science, D. Feitelson and L. Rudolph, Eds. Springer Berlin / Heidelberg, 2000, vol. 1911, pp. 187--207, 10.1007/3-540-39997-6_12. {Online}. Available: http://dx.doi.org/10.1007/3-540-39997-6_12 Google ScholarDigital Library
- D. Feitelson, "Parallel Workloads Archive", 2005. {Online}. Available: http://www.cs.huji.ac.il/labs/parallel/workload/Google Scholar
- J. Jahn, M. Al Faruque, and J. Henkel, "CARAT: Context-aware runtime adaptive task migration for multi core architectures", in IEEE/ACM 14th Design Automation and Test in Europe Conference (DATE), March 2011, pp. 515--520.Google ScholarCross Ref
- K. Bazargan, R. Kastner, and M. Sarrafzadeh, "Fast template placement for reconfigurable computing systems", IEEE Design Test of Computers, vol. 17, no. 1, pp. 68--83, 2000. Google ScholarDigital Library
Index Terms
- DistRM: distributed resource management for on-chip many-core systems
Recommendations
A performance study of general-purpose applications on graphics processors using CUDA
Graphics processors (GPUs) provide a vast number of simple, data-parallel, deeply multithreaded cores and high memory bandwidths. GPU architectures are becoming increasingly programmable, offering the potential for dramatic speedups for a variety of ...
Efficiency and scalability of barrier synchronization on NoC based many-core architectures
CASES '08: Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systemsInterconnects based on Networks-on-Chip are an appealing solution to address future microprocessor designs where, very likely, hundreds of cores will be connected on a single chip. A fundamental role in highly parallelized applications running on many-...
Analysis of computing and energy performance of multicore, NUMA, and manycore platforms for an irregular application
IA3 '13: Proceedings of the 3rd Workshop on Irregular Applications: Architectures and AlgorithmsThe exponential growth in processor performance seems to have reached a turning point. Nowadays, energy efficiency is as important as performance and has become a critical aspect to the development of scalable systems. These strict energy constraints ...
Comments