research-article

Free Access

On the simulation of large-scale architectures using multiple application abstraction levels

Authors:
Alejandro Rico

Barcelona Supercomputing Center and Universitat Politècnica de Catalunya, Spain

Barcelona Supercomputing Center and Universitat Politècnica de Catalunya, Spain
View Profile

,
Felipe Cabarcas

Barcelona Supercomputing Center and Universitat Politècnica de Catalunya, Spain

Barcelona Supercomputing Center and Universitat Politècnica de Catalunya, Spain
View Profile

,
Carlos Villavieja

Barcelona Supercomputing Center and Universitat Politècnica de Catalunya, Spain

Barcelona Supercomputing Center and Universitat Politècnica de Catalunya, Spain
View Profile

,
Milan Pavlovic

Barcelona Supercomputing Center and Universitat Politècnica de Catalunya, Spain

Barcelona Supercomputing Center and Universitat Politècnica de Catalunya, Spain
View Profile

,
Augusto Vega

Barcelona Supercomputing Center and Universitat Politècnica de Catalunya, Spain

Barcelona Supercomputing Center and Universitat Politècnica de Catalunya, Spain
View Profile

,
Yoav Etsion

Barcelona Supercomputing Center and Universitat Politècnica de Catalunya, Spain

Barcelona Supercomputing Center and Universitat Politècnica de Catalunya, Spain
View Profile

,
Alex Ramirez

Barcelona Supercomputing Center and Universitat Politècnica de Catalunya, Spain

Barcelona Supercomputing Center and Universitat Politècnica de Catalunya, Spain
View Profile

,
Mateo Valero

Barcelona Supercomputing Center and Universitat Politècnica de Catalunya, Spain

Barcelona Supercomputing Center and Universitat Politècnica de Catalunya, Spain
View Profile

ACM Transactions on Architecture and Code Optimization Volume 8 Issue 4Article No.: 36pp 1–20https://doi.org/10.1145/2086696.2086715

Published:26 January 2012Publication History

ACM Transactions on Architecture and Code Optimization

Abstract

Simulation is a key tool for computer architecture research. In particular, cycle-accurate simulators are extremely important for microarchitecture exploration and detailed design decisions, but they are slow and, so, not suitable for simulating large-scale architectures, nor are they meant for this. Moreover, microarchitecture design decisions are irrelevant, or even misleading, for early processor design stages and high-level explorations. This allows one to raise the abstraction level of the simulated architecture, and also the application abstraction level, as it does not necessarily have to be represented as an instruction stream.

In this paper we introduce a definition of different application abstraction levels, and how these are employed in TaskSim, a multi-core architecture simulator, to provide several architecture modeling abstractions, and simulate large-scale architectures with hundreds of cores. We compare the simulation speed of these abstraction levels to the ones in existing simulation tools, and also evaluate their utility and accuracy. Our simulations show that a very high-level abstraction, which may be even faster than native execution, is useful for scalability studies on parallel applications; and that just simulating explicit memory transfers, we achieve accurate simulations for architectures using non-coherent scratchpad memories, with just a 25x slowdown compared to native execution. Furthermore, we revisit trace memory simulation techniques, that are more abstract than instruction-by-instruction simulations and provide an 18x simulation speedup.

References

2011. Mercurium Project website. https://pm.bsc.es/projects/mcxx.Google Scholar
2011. NANOS++ Project website. https://pm.bsc.es/projects/nanox.Google Scholar
Austin, T., Larson, E., and Ernst, D. 2002. SimpleScalar: An infrastructure for computer system modeling. Computer 35, 2, 59--67. Google ScholarDigital Library
Badia, R. M., Labarta, J., Gimenez, J., and Escalé., F. 2003. DIMEMAS: Predicting MPI applications behavior in Grid environments. In Proceedings of the Workshop on Grid Applications and Programming Tools.Google Scholar
Barker, K. J., Davis, K., Hoisie, A., Kerbyson, D. J., Lang, M., Pakin, S., and Sancho, J. C. 2008. Entering the petaflop era: The architecture and performance of Roadrunner. In Proceedings of SC '08. 1:1--1:11. Google ScholarDigital Library
Bellens, P., Perez, J. M., Badia, R. M., and Labarta, J. 2006. CellSs: A Programming model for the Cell BE architecture. In Proceedings of SC '06. 86. Google ScholarDigital Library
Binkert, N. L., Dreslinski, R. G., Hsu, L. R., Lim, K. T., Saidi, A. G., and Reinhardt, S. K. 2006. The M5 simulator: Modeling networked systems. IEEE Micro 26, 4, 52--60. Google ScholarDigital Library
Black, B., Huang, A. S., Lipasti, M. H., and Shen, J. P. 1996. Can trace-driven simulators accurately predict superscalar performance&quest;In Proceedings of ICCD '96. 478--485. Google ScholarDigital Library
Blumofe, R. D., Joerg, C. F., Kuszmaul, B. C., Leiserson, C. E., Randall, K. H., and Zhou, Y. 1995. Cilk: An efficient multithreaded runtime system. SIGPLAN Not. 30, 8, 207--216. Google ScholarDigital Library
Bose, P. 2011. Integrated modeling challenges in extreme-scale computing. Proceedings of ISPASS'11. Google ScholarDigital Library
Charles, P., Grothoff, C., Saraswat, V., Donawa, C., Kielstra, A., Ebcioglu, K., von Praun, C., and Sarkar, V. 2005. X10: An object-oriented approach to non-uniform cluster computing. In Proceedings of OOPSLA '05. 519--538. Google ScholarDigital Library
Chen, J., Annavaram, M., and Dubois, M. 2009. SlackSim: A platform for parallel simulations of CMPs on CMPs. SIGARCH Comput. Archit. News 37, 20--29. Google ScholarDigital Library
Duran, A., Ayguadé, E., Badia, R. M., Labarta, J., Martinell, L., Martorell, X., and Planas, J. 2011. Ompss: A Proposal for Programming Heterogeneous Multi-Core Architectures. Parall. Proc. Lett. 21, 2, 173--193.Google ScholarCross Ref
Genbrugge, D., Eyerman, S., and Eeckhout, L. 2010. Interval simulation: Raising the level of abstraction in architectural simulation. In Proceedings of HPCA '10. 1--12.Google Scholar
Gonzalez, J., Gimenez, J., Casas, M., Moreto, M., Ramirez, A., Labarta, J., and Valero, M. 2011. Simulating whole supercomputer applications. IEEE Micro 31, 3, 32--45. Google ScholarDigital Library
Jefferson, D. R. and Sowrizal, H. A. 1982. Fast concurrent simulation using the Time Warp mechanism, part I: Local control. Rand Note N-1906AF, the Rand Corp.Google Scholar
Kahle, J. A., Day, M. N., Hofstee, H. P., Johns, C. R., Maeurer, T. R., and Shippy, D. 2005. Introduction to the Cell multiprocessor. IBM J. Res. Dev. 49, 4/5, 589--604. Google ScholarDigital Library
Lee, H., Jin, L., Lee, K., Demetriades, S., Moeng, M., and Cho, S. 2010. Two-phase trace-driven simulation (TPTS): A fast multicore processor architecture simulation approach. Softw. Pract. Exper. 40, 239--258. Google ScholarDigital Library
Lee, K., Evans, S., and Cho, S. 2009. Accurately approximating superscalar processor performance from traces. In Proceedings of ISPASS'09. 238--248.Google Scholar
Luk, C.-K., Cohn, R., Muth, R., Patil, H., Klauser, A., Lowney, G., Wallace, S., Janapa, V., and Hazelwood, R. K. 2005. Pin: Building customized program analysis tools with dynamic instrumentation. In Proceedings of PLDI '05. 190--200. Google ScholarDigital Library
Magnusson, P. S., Christensson, M., Eskilson, J., Forsgren, D., Hållberg, G., Högberg, J., Larsson, F., Moestedt, A., and Werner, B. 2002. Simics: A full system simulation platform. IEEE Computer 35, 2, 50--58. Google ScholarDigital Library
Martin, M. M. K., Sorin, D. J., Beckmann, B. M., Marty, M. R., Xu, M., Alameldeen, A. R., Moore, K. E., Hill, M. D., and Wood, D. A. 2005. Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset. SIGARCH Comput. Archit. News 33, 4, 92--99. Google ScholarDigital Library
Miller, J. E., Kasture, H., Kurian, G., Beckmann, N., III, C. G., Celio, C., Eastep, J., and Agarwal, A. 2009. Graphite: A distributed parallel simulator for multicores. Tech. rep. MIT-CSAIL-TR-2009-056, Massachusetts Institute of Technology.Google Scholar
Moudgill, M., Bose, P., and Moreno, J. 1999. Validation of Turandot, a fast processor model for microarchitecture exploration. In Proceedings of IPCCC'99. 451--457.Google Scholar
Mukherjee, S. S., Reinhardt, S. K., Falsafi, B., Litzkow, M., Hill, M. D., Wood, D. A., Huss-Lederman, S., and Larus, J. R. 2000. Wisconsin wind tunnel II: A fast, portable parallel architecture simulator. IEEE Concurrency 8, 12--20. Google ScholarDigital Library
Perelman, E., Hamerly, G., Van Biesbrouck, M., Sherwood, T., and Calder, B. 2003. Using SimPoint for accurate and efficient simulation. In Proceedings of SIGMETRICS '03. 318--319. Google ScholarDigital Library
Puzak, T. R. 1985. Analysis of cache replacement-algorithms. Ph.D. thesis. AAI8509594. Google ScholarDigital Library
Ramirez, A., Cabarcas, F., Juurlink, B., Mesa, A., Sanchez, F., Azevedo, A., Meenderinck, C., Ciobanu, C., Isaza, S., and Gaydadjiev, G. 2010. The SARC architecture. IEEE Micro 30, 5, 16--29. Google ScholarDigital Library
Reinders, J. 2007. Intel Threading Building Blocks. O'Reilly. Google ScholarDigital Library
Rico, A., Duran, A., Cabarcas, F., Etsion, Y., Ramirez, A., and Valero, M. 2011. Trace-driven simulation of multithreaded applications. In Proceedings of ISPASS'11. 87--96. Google ScholarDigital Library
Rico, A., Ramirez, A., and Valero, M. 2009. Available task-level parallelism on the Cell BE. Scientific Program. 17, 1-2, 59--76. Google ScholarDigital Library
Tikir, M. M., Laurenzano, M. A., Carrington, L., and Snavely, A. 2009. PSINS: An open source event tracer and execution simulator for MPI applications. In Proceedings of Euro-Par '09. 135--148. Google ScholarDigital Library
Uhlig, R. A. and Mudge, T. N. 1997. Trace-driven memory simulation: A survey. ACM Comput. Surv. 29, 128--170. Google ScholarDigital Library
Vega, A., Rico, A., Cabarcas, F., Ramírez, A., and Valero, M. 2010. Comparing last-level cache designs for CMP architectures. In Proceedings of IFMT '10. 2:1--2:11. Google ScholarDigital Library
Wang, W.-H. and Baer, J.-L. 1990. Efficient trace-driven simulation method for cache performance analysis. In Proceedings of SIGMETRICS'90. 27--36. Google ScholarDigital Library
Wenisch, T. F., Wunderlich, R. E., Falsafi, B., and Hoe, J. C. 2005. TurboSMARTS: accurate microarchitecture simulation sampling in minutes. In Proceedings of SIGMETRICS '05. 408--409. Google ScholarDigital Library
Wunderlich, R. E., Wenisch, T. F., Falsafi, B., and Hoe, J. C. 2003. SMARTS: Accelerating microarchitecture simulation via rigorous statistical sampling. In Proceedings of ISCA '03. 84--97. Google ScholarDigital Library
Yi, J. J., Eeckhout, L., Lilja, D. J., Calder, B., John, L. K., and Smith, J. E. 2006. The future of simulation: A field of dreams. Computer 39, 22--29. Google ScholarDigital Library

Index Terms

On the simulation of large-scale architectures using multiple application abstraction levels
1. Computing methodologies
  1. Modeling and simulation

Recommendations

Time warp on the go
SIMUTOOLS '12: Proceedings of the 5th International ICST Conference on Simulation Tools and Techniques

In this paper we deal with the impact of multi and many-core processor architectures on simulation. Despite the fact that modern CPUs have an increasingly large number of cores, most softwares are still unable to take advantage of them. In the last ...
Read More
A Simulation and Exploration Technology for Multimedia-Application-Driven Architectures

The increasing of computational power requirements for DSP and Multimedia application and the needs of easy-to-program development environment has driven recent programmable devices toward Very Long Instruction Word (VLIW) [1] architectures and Hw-Sw co-...
Read More
Hybridizing S3D into an exascale application using OpenACC: an approach for moving to multi-petaflops and beyond
SC '12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Hybridization is the process of converting an application with a single level of parallelism to an application with multiple levels of parallelism. Over the past 15 years a majority of the applications that run on High Performance Computing systems have ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Architecture and Code Optimization Volume 8, Issue 4
Special Issue on High-Performance Embedded Architectures and Compilers
January 2012
765 pages
ISSN:1544-3566
EISSN:1544-3973
DOI:10.1145/2086696
Issue’s Table of Contents

Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 26 January 2012
- Accepted: 1 November 2011
- Revised: 1 October 2011
- Received: 1 July 2011
Published in taco Volume 8, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Multi-core
abstraction levels
simulation
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 38
  Total Citations
  View Citations
- 937
  Total Downloads
- Downloads (Last 12 months)56
- Downloads (Last 6 weeks)13
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

On the simulation of large-scale architectures using multiple application abstraction levels

ACM Transactions on Architecture and Code Optimization

Abstract

References

Cited By

Index Terms

Recommendations

Time warp on the go

A Simulation and Exploration Technology for Multimedia-Application-Driven Architectures

Hybridizing S3D into an exascale application using OpenACC: an approach for moving to multi-petaflops and beyond

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

On the simulation of large-scale architectures using multiple application abstraction levels

ACM Transactions on Architecture and Code Optimization

Abstract

References

Cited By

Index Terms

Recommendations

Time warp on the go

A Simulation and Exploration Technology for Multimedia-Application-Driven Architectures

Hybridizing S3D into an exascale application using OpenACC: an approach for moving to multi-petaflops and beyond

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media