research-article

Analytical Performance Models for NoCs with Multiple Priority Traffic Classes

Authors:
Sumit K. Mandal

Arizona State University, Tempe, AZ

Arizona State University, Tempe, AZ

0000-0002-9294-1603
View Profile

,
Raid Ayoub

Intel Corporation, Hillsboro, OR

Intel Corporation, Hillsboro, OR
View Profile

,
Michael Kishinevsky

Intel Corporation, Hillsboro, OR

Intel Corporation, Hillsboro, OR
View Profile

,
Umit Y. Ogras

Arizona State University, Tempe, AZ

Arizona State University, Tempe, AZ
View Profile

Authors Info & Claims

ACM Transactions on Embedded Computing Systems Volume 18 Issue 5sArticle No.: 52pp 1–21https://doi.org/10.1145/3358176

Published:07 October 2019Publication History

ACM Transactions on Embedded Computing Systems

Abstract

Networks-on-chip (NoCs) have become the standard for interconnect solutions in industrial designs ranging from client CPUs to many-core chip-multiprocessors. Since NoCs play a vital role in system performance and power consumption, pre-silicon evaluation environments include cycle-accurate NoC simulators. Long simulations increase the execution time of evaluation frameworks, which are already notoriously slow, and prohibit design-space exploration. Existing analytical NoC models, which assume fair arbitration, cannot replace these simulations since industrial NoCs typically employ priority schedulers and multiple priority classes. To address this limitation, we propose a systematic approach to construct priority-aware analytical performance models using micro-architecture specifications and input traffic. Our approach decomposes the given NoC into individual queues with modified service time to enable accurate and scalable latency computations. Specifically, we introduce novel transformations along with an algorithm that iteratively applies these transformations to decompose the queuing system. Experimental evaluations using real architectures and applications show high accuracy of 97% and up to 2.5× speedup in full-system simulation.

References

N. Agarwal et al. [n.d.]. GARNET: A detailed on-chip network model inside a full-system simulator. In 2009 IEEE intl. symp. on Performance Analysis of Systems and Software. 33--42.Google Scholar
I. Awan and R. Fretwell. 2005. Analysis of discrete-time queues with space and service priorities for arbitrary arrival processes. In Parallel and Distributed Systems. Proc. 11th Intl Conf. on, Vol. 2. 115--119.Google Scholar
A. Bartolini et al. 2010. A virtual platform environment for exploring power, thermal and reliability management control strategies in high-performance multicores. In Proc. of the Great lakes Symp. on VLSI. 311--316.Google ScholarDigital Library
A. W. Berger and W. Whitt. 2000. Workload bounds in fluid models with priorities. Performance Evaluation 41, 4 (2000), 249--267.Google ScholarDigital Library
D. P. Bertsekas, R. G. Gallager, and P. Humblet. 1992. Data Networks. Vol. 2. Prentice-Hall International New Jersey.Google Scholar
C. Bienia, S. Kumar, J. P. Singh, and K. Li. 2008. The PARSEC benchmark suite: Characterization and architectural implications. In Proc. of the Intl. Conf. on Parallel Arch. and Compilation Tech. 72--81.Google Scholar
N. Binkert et al. 2011. The Gem5 simulator. SIGARCH Comp. Arch. News (May. 2011).Google ScholarDigital Library
P. Bogdan and R. Marculescu. 2011. Non-stationary traffic analysis and its implications on multicore platform design. IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems 30, 4 (2011), 508--519.Google ScholarDigital Library
G. Bolch, S. Greiner, H. De Meer, and K. S. Trivedi. 2006. Queueing Networks and Markov Chains: Modeling and Performance Evaluation with Computer Science Applications. John Wiley 8 Sons.Google Scholar
W. Choi et al. 2017. On-chip communication network for efficient training of deep convolutional networks on heterogeneous manycore systems. IEEE Trans. on Computers 67, 5 (2017), 672--686.Google ScholarCross Ref
A. C. de Melo. 2010. The new linux perf tools. In Linux Kongress, Vol. 18.Google Scholar
J. Doweck et al. 2017. Inside 6th-generation intel core: New microarchitecture code-named skylake. IEEE Micro 2 (2017), 52--62.Google ScholarDigital Library
S. Ikehara and M. Miyazaki. [n.d.]. Approximate analysis of queueing networks with nonpreemptive priority scheduling. In Proc. 11th Int. Teletraffic Congr.Google Scholar
J. Jeffers, J. Reinders, and A. Sodani. 2016. Intel Xeon Phi Processor High Performance Programming: Knights Landing Edition. Morgan Kaufmann.Google Scholar
N. Jiang et al. [n.d.]. A detailed and flexible cycle-accurate network-on-chip simulator. In 2013 IEEE Intl. Symp. on Performance Analysis of Systems and Software (ISPASS). 86--96.Google Scholar
X. Jin and G. Min. 2009. Modelling and analysis of priority queueing systems with multi-class self-similar network traffic: A novel and efficient queue-decomposition approach. IEEE Trans. on Communications 57, 5 (2009).Google Scholar
J. A. Kahle et al. 2005. Introduction to the cell multiprocessor. IBM journal of Research and Development 49, 4.5 (2005), 589--604.Google ScholarDigital Library
H. Kashif and H. Patel. 2014. Bounding buffer space requirements for real-time priority-aware networks. In Asia and South Pacific Design Autom. Conf. 113--118.Google Scholar
C. N. Keltcher, K. J. McGrath, A. Ahmed, and P. Conway. 2003. The AMD opteron processor for multiprocessor servers. IEEE Micro 23, 2 (2003), 66--76.Google ScholarDigital Library
A. E. Kiasari, Z. Lu, and A. Jantsch. 2013. An analytical latency model for networks-on-chip. IEEE Trans. on Very Large Scale Integration (VLSI) Systems 21, 1 (2013), 113--123.Google ScholarDigital Library
R. Leupers et al. 2011. Virtual manycore platforms: Moving towards 100+ processor cores. In Proc. of DATE. 1--6.Google ScholarCross Ref
P. S. Magnusson et al. [n.d.]. Simics: A full system simulation platform. Computer 35, 2 ([n. d.]), 50--58.Google Scholar
U. Y. Ogras, P. Bogdan, and R. Marculescu. 2010. An analytical approach for network-on-chip performance analysis. IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems 29, 12 (2010), 2001--2013.Google ScholarDigital Library
U. Y. Ogras, Y. Emre, J. Xu, T. Kam, and M. Kishinevsky. 2012. Energy-guided exploration of on-chip network design for exa-scale computing. In Proc. of Intl. Workshop on System Level Interconnect Prediction. 24--31.Google Scholar
U. Y. Ogras, M. Kishinevsky, and S. Chatterjee. [n.d.]. xPLORE: Communication Fabric Design and Optimization Framework. Developed at Strategic CAD Labs, Intel Corp.Google Scholar
P. P. Pande, C. Grecu, M. Jones, A. Ivanov, and R. Saleh. 2005. Performance evaluation and design trade-offs for network-on-chip interconnect architectures. IEEE Transactions on Computers 54, 8 (2005), 1025--1040.Google ScholarDigital Library
A. Patel et al. 2011. MARSS: A full system simulator for multicore x86 CPUs. In Design Autom. Conf. 1050--1055.Google ScholarDigital Library
Y. Qian, Z. Lu, and W. Dou. [n.d.]. Analysis of worst-case delay bounds for best-effort communication in wormhole networks on chip. In 2009 3rd ACM/IEEE Interl. Symp. on Networks-on-Chip. 44--53.Google Scholar
Z.-L. Qian et al. 2015. A support vector regression (SVR)-based latency model for network-on-chip (NoC) architectures. IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems 35, 3 (2015), 471--484.Google ScholarDigital Library
A. Rico et al. 2017. ARM HPC ecosystem and the reemergence of vectors. In Proc. of the Computing Frontiers Conf. ACM, 329--334.Google ScholarDigital Library
E. Rotem and S. P. Engineer. 2015. Intel architecture, code name skylake deep dive: A new architecture to manage power performance and energy efficiency. In Intel Developer Forum.Google Scholar
M. P. Singh and M. K. Jain. 2014. Evolution of processor architecture in mobile phones. Intl. Journ. of Computer Applications 90, 4 (2014).Google Scholar
J. Walraevens. 2004. Discrete-time Queueing Models with Priorities. Ph.D. Dissertation. Ghent University.Google Scholar
P. Wettin et al. 2014. Performance evaluation of wireless NoCs in presence of irregular network routing strategies. In Proc. of the conf. on DATE. 272.Google Scholar
Y. Wu et al. 2010. Analytical modelling of networks in multicomputer systems under bursty and batch arrival traffic. The Journ. of Supercomputing 51, 2 (2010), 115--130.Google ScholarDigital Library
Venkata Yaswanth Raparti, Nishit Kapadia, and Sudeep Pasricha. 2017. ARTEMIS: An aging-aware runtime application mapping framework for 3D NoC-based chip multiprocessors. IEEE Transactions on Multi-Scale Computing Systems 3, 2 (2017), 72--85.Google ScholarCross Ref

Index Terms

Analytical Performance Models for NoCs with Multiple Priority Traffic Classes
1. Computer systems organization
  1. Embedded and cyber-physical systems
    1. System on a chip
2. Networks
  1. Network performance evaluation
    1. Network performance modeling

Recommendations

Analytical Performance Modeling of NoCs under Priority Arbitration and Bursty Traffic
Networks-on-Chip (NoCs) used in commercial many-core processors typically incorporate priority arbitration. Moreover, they experience bursty traffic due to application workloads. However, most state-of-the-art NoC analytical performance analysis ...
Read More
Towards High-Performance Bufferless NoCs with SCEPTER

In the many-core era, the network on-chip (NoC) is playing a larger role in meeting performance, area and power goals, as router buffers contribute greatly to NoC area and power usage. Proposals have advocated bufferless NoCs, however a performance wall ...
Read More
Designing High-Performance, Power-Efficient NoCs With Embedded Silicon-in-Silica Nanophotonics
NOCS '15: Proceedings of the 9th International Symposium on Networks-on-Chip

On-chip electrical links exhibit large energy-to-bandwidth costs, whereas on-chip nanophotonics, which attain high throughput, yet energy-efficient communication, have emerged as an alternative interconnect in multicore chips. Here we consider silicon ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Embedded Computing Systems Volume 18, Issue 5s
Special Issue ESWEEK 2019, CASES 2019, CODES+ISSS 2019 and EMSOFT 2019
October 2019
1423 pages
ISSN:1539-9087
EISSN:1558-3465
DOI:10.1145/3365919
Editor:
Sandeep K. Shukla
Indian Institute of Technology, India
Issue’s Table of Contents
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States

Journal Family
ACM Journals for the Design of Smart and Connected Systems
Publication History
- Published: 7 October 2019
- Accepted: 1 July 2019
- Revised: 1 June 2019
- Received: 1 April 2019
Published in tecs Volume 18, Issue 5s

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
NoC performance analysis
priority-based NoC
queuing networks
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 16
  Total Citations
  View Citations
- 501
  Total Downloads
- Downloads (Last 12 months)124
- Downloads (Last 6 weeks)9
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Analytical Performance Models for NoCs with Multiple Priority Traffic Classes

ACM Transactions on Embedded Computing Systems

Abstract

References

Cited By

Index Terms

Recommendations

Analytical Performance Modeling of NoCs under Priority Arbitration and Bursty Traffic

Towards High-Performance Bufferless NoCs with SCEPTER

Designing High-Performance, Power-Efficient NoCs With Embedded Silicon-in-Silica Nanophotonics