research-article

DistRM: distributed resource management for on-chip many-core systems

Authors:
Sebastian Kobbe

Karlsruhe Institute of Technology, Karlsruhe, Germany

Karlsruhe Institute of Technology, Karlsruhe, Germany
View Profile

,
Lars Bauer

Karlsruhe Institute of Technology, Karlsruhe, Germany

Karlsruhe Institute of Technology, Karlsruhe, Germany
View Profile

,
Daniel Lohmann

Friedrich-Alexander-Universität, Erlangen-Nuremberg, Germany

Friedrich-Alexander-Universität, Erlangen-Nuremberg, Germany
View Profile

,
Wolfgang Schröder-Preikschat

Friedrich-Alexander-Universität, Erlangen-Nuremberg, Germany

Friedrich-Alexander-Universität, Erlangen-Nuremberg, Germany
View Profile

,
Jörg Henkel

Karlsruhe Institute of Technology, Karlsruhe, Germany

Karlsruhe Institute of Technology, Karlsruhe, Germany
View Profile

CODES+ISSS '11: Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesisOctober 2011Pages 119–128https://doi.org/10.1145/2039370.2039392

Published:09 October 2011Publication History

CODES+ISSS '11: Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis

Pages 119–128

ABSTRACT

The trend towards many-core systems comes with various issues, among them their highly dynamic and non-predictable workloads. Hence, new paradigms for managing resources of many-core systems are of paramount importance. The problem of resource management, e.g. mapping applications to processor cores, is NP-hard though, requiring heuristics especially when performed online. In this paper, we therefore present a novel resource-management scheme that supports so-called malleable applications. These applications can adopt their level of parallelism to the assigned resources. By design, our (decentralized) scheme is scalable and it copes with the computational complexity by focusing on local decision-making. Our simulations show that the quality of the mapping decisions of our approach is able to stay near the mapping quality of state-of-the-art (i.e. centralized) online schemes for malleable applications but at a reduced overall communication overhead (only about 12,75% on a 1024 core system with a total workload of 32 multi-threaded applications). In addition, our approach is scalable as opposed to a centralized scheme and therefore it is practically useful for employment in large many-core systems as our extensive studies and experiments show.

References

J. Howard, S. Dighe, Y. Hoskote et al., "A 48-Core IA-32 message-passing processor with DVFS in 45nm CMOS", in IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), February 2010, pp. 108--109.Google Scholar
Tilera Corporation, "Tile-GX Processor Family", 2011.Google Scholar
L. Benini and G. De Micheli, "Networks on chips: a new SoC paradigm", IEEE Computer, vol. 35, no. 1, pp. 70--78, January 2002. Google ScholarDigital Library
S. Borkar, "Thousand core chips: a technology perspective", in Proceedings of the 44th annual Design Automation Conference (DAC), 2007, pp. 746--749. Google ScholarDigital Library
C. Marcon, E. Moreno, N. Calazans, and F. Moraes, "Evaluation of algorithms for low energy mapping onto NoCs", in IEEE International Symposium on Circuits and Systems (IS-CAS), May 2007, pp. 389--392.Google ScholarCross Ref
J. Turek, J. L. Wolf, and P. S. Yu, "Approximate algorithms scheduling parallelizable tasks", in Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures (SPAA), 1992, pp. 323--332. Google ScholarDigital Library
D. Feitelson, L. Rudolph, U. Schwiegelshohn et al., "Theory and practice in parallel job scheduling", in Job Scheduling Strategies for Parallel Processing, 1997, vol. 1291, pp. 1--34. Google ScholarDigital Library
R. Blumofe and C. Leiserson, "Scheduling multithreaded computations by work stealing", Foundations of Computer Science, Annual IEEE Symposium on, vol. 0, pp. 356--368, 1994. Google ScholarDigital Library
J. Reinders, Intel threading building blocks, 1st ed. Sebastopol, CA, USA: O'Reilly & Associates, Inc., 2007. Google ScholarDigital Library
P. Horn, "Autonomic Computing: IBM's Perspective on the State of Information Technology", Computing Systems, pp. 1--40, 2001.Google Scholar
G. Weiss, Ed., Multiagent Systems: A Modern Approach to Distributed Artificial Intelligence. MIT Press, 1999. Google ScholarDigital Library
N. R. Jennings, K. Sycara, and M. Wooldridge, "A Roadmap of Agent Research and Development", Autonomous Agents and Multi-Agent Systems, vol. 1, pp. 7--38, 1998. Google ScholarDigital Library
J. Boillat and P. Kropf, "A fast distributed mapping algorithm", in CONPAR 90 - VAPP IV, 1990, vol. 457, pp. 405--416. Google ScholarDigital Library
B. Yang, L. Guang, T. Xu et al., "Multi-application multi-step mapping method for many-core network-on-chips", in NORCHIP, November 2010, pp. 1--6.Google Scholar
G. Sabin, M. Lang, and P. Sadayappan, "Moldable parallel job scheduling using job efficiency: An iterative approach", in Job Scheduling Strategies for Parallel Processing, 2007, vol. 4376, pp. 94--114. Google ScholarDigital Library
M. A. Al Faruque, R. Krist, and J. Henkel, "ADAM: run-time agent-based distributed application mapping for on-chip communication", in Proceedings of the 45th annual Design Automation Conference (DAC), 2008, pp. 760--765. Google ScholarDigital Library
F. Berman, G. Fox, and T. Hey, Grid Computing - Making the Global Infrastructure a Reality. Chichester, UK: John Wiley & Sons, Ltd., 2003. Google ScholarDigital Library
J. Cao, S. A. Jarvis, S. Saini et al., "Arms: An agent-based resource management system for grid computing", Sci. Pro-gram., vol. 10, pp. 135--148, April 2002. Google ScholarDigital Library
F. Berman, R. Wolski, H. Casanova et al., "Adaptive computing on the Grid using AppLeS", IEEE Transactions on Parallel and Distributed Systems, vol. 14, no. 4, pp. 369--382, April 2003. Google ScholarDigital Library
A. B. Downey, "A Model for Speedup of Parallel Programs", Tech. Rep., 1997. Google ScholarDigital Library
D. Feitelson, "Metric and workload effects on computer sys-tems evaluation", Computer, vol. 36, no. 9, pp. 18--25, sept. 2003. Google ScholarDigital Library
B. Zhou, D. Walsh, and R. Brent, "Resource allocation schemes for gang scheduling", in Job Scheduling Strategies for Parallel Processing, ser. Lecture Notes in Computer Science, D. Feitelson and L. Rudolph, Eds. Springer Berlin / Heidelberg, 2000, vol. 1911, pp. 74--86, 10.1007/3-540-39997-6_6. {Online}. Available: http://dx.doi.org/10.1007/3-540-39997-6_6 Google ScholarDigital Library
W. Cirne and F. Berman, "Adaptive selection of partition size for supercomputer requests", in Job Scheduling Strategies for Parallel Processing, ser. Lecture Notes in Computer Science, D. Feitelson and L. Rudolph, Eds. Springer Berlin / Heidelberg, 2000, vol. 1911, pp. 187--207, 10.1007/3-540-39997-6_12. {Online}. Available: http://dx.doi.org/10.1007/3-540-39997-6_12 Google ScholarDigital Library
D. Feitelson, "Parallel Workloads Archive", 2005. {Online}. Available: http://www.cs.huji.ac.il/labs/parallel/workload/Google Scholar
J. Jahn, M. Al Faruque, and J. Henkel, "CARAT: Context-aware runtime adaptive task migration for multi core architectures", in IEEE/ACM 14th Design Automation and Test in Europe Conference (DATE), March 2011, pp. 515--520.Google ScholarCross Ref
K. Bazargan, R. Kastner, and M. Sarrafzadeh, "Fast template placement for reconfigurable computing systems", IEEE Design Test of Computers, vol. 17, no. 1, pp. 68--83, 2000. Google ScholarDigital Library

Index Terms

DistRM: distributed resource management for on-chip many-core systems
1. Software and its engineering
  1. Software organization and properties
    1. Software system structures
      1. Distributed systems organizing principles

Recommendations

A performance study of general-purpose applications on graphics processors using CUDA

Graphics processors (GPUs) provide a vast number of simple, data-parallel, deeply multithreaded cores and high memory bandwidths. GPU architectures are becoming increasingly programmable, offering the potential for dramatic speedups for a variety of ...
Read More
Efficiency and scalability of barrier synchronization on NoC based many-core architectures
CASES '08: Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems

Interconnects based on Networks-on-Chip are an appealing solution to address future microprocessor designs where, very likely, hundreds of cores will be connected on a single chip. A fundamental role in highly parallelized applications running on many-...
Read More
Analysis of computing and energy performance of multicore, NUMA, and manycore platforms for an irregular application
IA³ '13: Proceedings of the 3rd Workshop on Irregular Applications: Architectures and Algorithms

The exponential growth in processor performance seems to have reached a turning point. Nowadays, energy efficiency is as important as performance and has become a critical aspect to the development of scalable systems. These strict energy constraints ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CODES+ISSS '11: Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
October 2011
402 pages
ISBN:9781450307154
DOI:10.1145/2039370
Program Chairs:
Robert P. Dick
University of Michigan
,
Jan Madsen
Technical University of Denmark
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 9 October 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
manycore
mpsoc
multi-agent-system
multicore
resource management
scalability
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate280of864submissions,32%
Upcoming Conference
ESWEEK '24

Sponsor:

sigbed

sigbed

sigbed

Twentieth Embedded Systems Week

September 29 - October 4, 2024

Raleigh , NC , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 92
  Total Citations
  View Citations
- 525
  Total Downloads
- Downloads (Last 12 months)9
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

DistRM: distributed resource management for on-chip many-core systems

CODES+ISSS '11: Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis

ABSTRACT

References

Cited By

Index Terms

Recommendations

A performance study of general-purpose applications on graphics processors using CUDA

Efficiency and scalability of barrier synchronization on NoC based many-core architectures

Analysis of computing and energy performance of multicore, NUMA, and manycore platforms for an irregular application

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

DistRM: distributed resource management for on-chip many-core systems

CODES+ISSS '11: Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis

ABSTRACT

References

Cited By

Index Terms

Recommendations

A performance study of general-purpose applications on graphics processors using CUDA

Efficiency and scalability of barrier synchronization on NoC based many-core architectures

Analysis of computing and energy performance of multicore, NUMA, and manycore platforms for an irregular application

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media