research-article

Free Access

Optimizing software runtime systems for speculative parallelization

Authors:
Paraskevas Yiapanis

University of Manchester, Manchester, UK

University of Manchester, Manchester, UK
View Profile

,
Demian Rosas-Ham

University of Manchester, Manchester, UK

University of Manchester, Manchester, UK
View Profile

,
Gavin Brown

University of Manchester, Manchester, UK

University of Manchester, Manchester, UK
View Profile

,
Mikel Luján

University of Manchester, Manchester, UK

University of Manchester, Manchester, UK
View Profile

ACM Transactions on Architecture and Code Optimization Volume 9 Issue 4Article No.: 39pp 1–27https://doi.org/10.1145/2400682.2400698

Published:20 January 2013Publication History

ACM Transactions on Architecture and Code Optimization

Abstract

Thread-Level Speculation (TLS) overcomes limitations intrinsic with conservative compile-time auto-parallelizing tools by extracting parallel threads optimistically and only ensuring absence of data dependence violations at runtime.

A significant barrier for adopting TLS (implemented in software) is the overheads associated with maintaining speculative state. Based on previous TLS limit studies, we observe that on future multicore systems we will likely have more cores idle than those which traditional TLS would be able to harness. This implies that a TLS system should focus on optimizing for small number of cores and find efficient ways to take advantage of the idle cores. Furthermore, research on optimistic systems has covered two important implementation design points: eager vs. lazy version management. With this knowledge, we propose new simple and effective techniques to reduce the execution time overheads for both of these design points.

This article describes a novel compact version management data structure optimized for space overhead when using a small number of TLS threads. Furthermore, we describe two novel software runtime parallelization systems that utilize this compact data structure. The first software TLS system, MiniTLS, relies on eager memory data management (in-place updates) and, thus, when a misspeculation occurs a rollback process is required. MiniTLS takes advantage of the novel compact version management representation to parallelize the rollback process and is able to recover from misspeculation faster than existing software eager TLS systems.

The second one, Lector (Lazy inspECTOR) is based on lazy version management. Since we have idle cores, the question is whether we can create “helper” tasks to determine whether speculation is actually needed without stopping or damaging the speculative execution. In Lector, for each conventional TLS thread running speculatively with lazy version management, there is associated with it a lightweight inspector. The inspector threads execute alongside to verify quickly whether data dependencies will occur. Inspector threads are generated by standard techniques for inspector/executor parallelization.

We have applied both TLS systems to seven Java sequential benchmarks, including three benchmarks from SPECjvm2008. Two out of the seven benchmarks exhibit misspeculations. MiniTLS experiments report average speedups of 1.8x for 4 threads increasing close to 7x speedups with 32 threads. Facilitated by our novel compact representation, MiniTLS reduces the space overhead over state-of-the-art software TLS systems between 96% on 2 threads and 40% on 32 threads. The experiments for Lector, report average speedups of 1.7x for 2 threads (that is 1 TLS + 1 Inspector threads) increasing close to 8.2x speedups with 32 threads (16 + 16 threads). Compared to a well established software TLS baseline, Lector performs on average 1.7x faster for 32 threads and in no case (x TLS + x Inspector threads) Lector delivers worse performance than the baseline TLS with the equivalent number of TLS threads (i.e. x TLS threads) nor doubling the equivalent number of TLS threads (i.e., x + x TLS threads).

References

Agarwal, A., Simoni, R., Hennessy, J., and Horowitz, M. 1988. An evaluation of directory schemes for cache coherence. In Proceedings of the International Symposium on Computer Architecture. 280--298. Google ScholarDigital Library
Ali-Reza, Adl-Tabatabai, Shpeisman, T., and Gottschlich, J. 2012. Draft specification of transactional language constructs for C++. Tech. rep., Transactional Memory Specification Drafting Group.Google Scholar
Chen, M. K. and Olukotun, K. 2003. The JRPM system for dynamically parallelizing Java programs. In Proceedings of the 30th Annual International Symposium on Computer Architecture (ISCA'03). 434--446. Google ScholarDigital Library
Chen, T., Min, F., Nagarajan, V., and Gupta, R. 2008. Copy or discard execution model for speculative parallelization on multicores. In Proceedings of the International Symposium on Microarchitecture. 330--341. Google ScholarDigital Library
Cintra, M. and Llanos, D. 2005. Design space exploration of a software speculative parallelization scheme. IEEE Trans. Paral. Distrib. Syst. 16, 6, 562--576. Google ScholarDigital Library
Cintra, M., Martínez, J. F., and Torrellas, J. 2000. Architectural support for scalable speculative parallelization in shared-memory multiprocessors. In Proceedings of the 27th Annual International Symposium on Computer Architecture (ISCA'00). 13--24. Google ScholarDigital Library
Culler, D., Singh, J., and Gupta, A. 1998. Parallel Computer Architecture: A Hardware/Software Approach. Google ScholarDigital Library
Dalessandro, L., Spear, M. F., and Scott, M. L. 2010. NOrec: streamlining STM by abolishing ownership records. In Proceedings of Symposium on Principles and Practice of Parallel Programming. 67--78. Google ScholarDigital Library
Dang, F., Yu, H., and Rauchwerger, L. 2002. The R-LRPD Test: Speculative parallelization of partially parallel loops. In Proceedings of International Parallel and Distributed Processing Symposium. 20--29. Google ScholarDigital Library
Dice, D., Shalev, O., and Shavit, N. 2006. Transactional Locking II. In Proceedings of the 20th International Symposium on Distributed Computing (DISC 2006). 194--208. Google ScholarDigital Library
Dice, D. and Shavit, N. 2010. TLRW: return of the read-write lock. In Proceedings of the 22nd ACM Symposium on Parallelism in Algorithms and Architectures (SPAA'10). 284--293. Google ScholarDigital Library
Felber, P., Fetzer, C., and Riegel, T. 2008. Dynamic performance tuning of word-based software transactional memory. In Proceedings of the Symposium on Principles and Practice of Parallel Programming. 237--246. Google ScholarDigital Library
Gupta, A., Weber, W.-D., and Mowry, T. C. 1990. Reducing memory and traffic requirements for scalable directory-based cache coherence schemes. In Proceedings of the International Conference on Parallel Processing (ICPP). 312--321.Google Scholar
Ioannou, N., Singer, J., Khan, S., Yiapanis, P., Pocock, A., Xekalakis, P., Brown, G., Luján, M., Watson, I., and Cintra, M. 2010. Toward a more accurate understanding of the limits of the TLS execution paradigm. In Proceedings of the IEEE International Symposium on Workload Characterization. Google ScholarDigital Library
Johnson, T., Eigenmann, R., and Vijaykumar, T. 2007. Speculative thread decomposition through empirical optimization. In Proceedings of the Principles and Practice of Parallel Programming. 205--214. Google ScholarDigital Library
Johnson, T. A., Eigenmann, R., and Vijaykumar, T. N. 2004. Min-cut program decomposition for thread-level speculation. In Proceedings of the on Programming Language Design and Implementation. 59--70. Google ScholarDigital Library
Kim, H., Johnson, N., Lee, J., Mahlke, S., and August, D. I. 2012. Automatic speculative DOALL for clusters. In Proceedings of the International Symposium on Code Generation and Optimization (CGO). Google ScholarDigital Library
Kim, H., Raman, A., Liu, F., Lee, J. W., and August, D. I. 2010. Scalable speculative parallelization on commodity clusters. In Proceedings of the International Symposium on Microarchitecture (MICRO'43). 3--14. Google ScholarDigital Library
Liu, W., Tuck, J., Ceze, L., Ahn, W., Strauss, K., Renau, J., and Torrellas, J. 2006. POSH: A TLS compiler that exploits program structure. In Proceedings of the 11th ACM SIGPLAN Symposium on Principles of Parallel Programming. 158--167. Google ScholarDigital Library
Luo, Y., Packirisamy, V., Hsu, W.-C., Zhai, A., Mungre, N., and Tarkas, A. 2009. Dynamic performance tuning for speculative threads. In Proceedings of the International Symposium on Computer Architecture. 462--473. Google ScholarDigital Library
Madriles, C., Lopez, P., Codina, J. M., Gibert, E., Latorre, F., Martinez, A., Martinez, R., and Gonzalez, A. 2009. Anaphase: A fine-grain thread decomposition scheme for speculative multithreading. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques. 15--25. Google ScholarDigital Library
Mehrara, M., Hao, J., Hsu, P.-C., and Mahlke, S. 2009. Parallelizing sequential applications on commodity hardware using a low-cost software transactional memory. In Proceedings of the 2009 ACM SIGPLAN Conference on Programming Language Design and Implementation. 166--176. Google ScholarDigital Library
Oancea, C., Mycroft, A., and Harris, T. 2009. A lightweight in-place implementation for software thread-level speculation. In Proceedings of the Symposium on Parallelism in Algorithms and Architectures. 223--232. Google ScholarDigital Library
Ottoni, G., Rangan, R., Stoler, A., and August, D. I. 2005. Automatic thread extraction with decoupled software pipelining. In Proceedings of the International Symposium on Microarchitecture. 105--118. Google ScholarDigital Library
Packirisamy, V., Zhai, A., Hsu, W.-C., Yew, P.-C., and Ngai, T.-F. 2009. Exploring speculative parallelism in SPEC2006. In Proceedings of the International Symposium on Performance Analysis of Systems and Software. 77--88.Google Scholar
Prabhu, M. K. and Olukotun, K. 2005. Exposing speculative thread parallelism in SPEC2000. In Proceedings of the International Symposium on Principles and Practice of Parallel programming. 142--152. Google ScholarDigital Library
Quiñones, C. G., Madriles, C., Sánchez, J., Marcuello, P., González, A., and Tullsen, D. M. 2005. Mitosis compiler: An infrastructure for speculative threading based on pre-computation slices. In Proceedings of the ACM Conference on Programming Language Design and Implementation (PLDI). 269--279. Google ScholarDigital Library
Raman, A., Kim, H., Mason, T. R., Jablin, T. B., and August, D. I. 2010. Speculative parallelization using software multi-threaded transactions. In Proceedings of the Architectural Support for Programming Languages and Operating Systems. 65--76. Google ScholarDigital Library
Rauchwerger, L. 1998. Run-time parallelization: Its time has come. Paral. Comput. 24, 527--556. Google ScholarDigital Library
Rauchwerger, L. and Padua, D. 1994. The privatizing DOALL test: A run-time technique for DOALL loop identification and array privatization. In Proceedings of the 8th International Conference on Supercomputing (ICS'94). 33--43. Google ScholarDigital Library
Rauchwerger, L. and Padua, D. 1995. The LRPD test: Speculative run-time parallelization of loops with privatization and reduction parallelization. In Proceedings of the ACM SIGPLAN 1995 Conference on Programming Language Design and Implementation (PLDI'95). 218--232. Google ScholarDigital Library
Saha, B., Adl-Tabatabai, A.-R., Hudson, R. L., Minh, C. C., and Hertzberg, B. 2006. McRT-STM: A high performance software transactional memory system for a multi-core runtime. In Proceedings of ACM Symposium on Principles and Practice of Parallel Programming (PPoPP'06). 187--197. Google ScholarDigital Library
Saltz, J. H., Berryman, H., and Wu, J. 1991. Multiprocessors and run-time compilation. Concurr. Practice Exper. 573--592.Google Scholar
Spear, M. F., Dalessandro, L., Marathe, V. J., and Scott, M. L. 2009a. A comprehensive strategy for contention management in software transactional memory. In Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'09). 141--150. Google ScholarDigital Library
Spear, M. F., Shriraman, A., Daless, L., and Scott, M. L. 2009b. Transactional mutex locks. In Proceedings of the SIGPLAN Workshop on Transactional Computing.Google Scholar
Tian, C., Feng, M., and Gupta, R. 2010. Supporting speculative parallelization in the presence of dynamic data structures. In Proceedings of the Symposium on Programming Language Design and Implementation. 62--73. Google ScholarDigital Library
Wang, A., Gaudet, M., Wu, P., Amaral, J., Ohmacht, M., Barton, C., Silvera, R., and Michael, M. 2012. Evaluation of blue Gene/Q hardware support for transactional memories. In Proceedings of the Symposium on Parallel Architecture and Compilation Techniques. Google ScholarDigital Library
Wang, C., Chen, W.-Y., Wu, Y., Saha, B., and Adl-Tabatabai, A.-R. 2007. Code generation and optimization for transactional memory constructs in an unmanaged language. In Proceedings of the International Symposium on Code Generation and Optimization (CGO'07). 34--48. Google ScholarDigital Library
Watson, I., Kirkham, C., and Lujan, M. 2007. A study of a transactional parallel routing algorithm. In Proceedings of the Symposium on Parallel Architecture and Compilation Techniques. 388--398. Google ScholarDigital Library

Index Terms

Optimizing software runtime systems for speculative parallelization
1. Computing methodologies
  1. Concurrent computing methodologies
    1. Concurrent programming languages
2. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Runtime environments
    2. General programming languages
      1. Language types
        Concurrent programming languages

Recommendations

Compiler-Driven Software Speculation for Thread-Level Parallelism

Current parallelizing compilers can tackle applications exercising regular access patterns on arrays or affine indices, where data dependencies can be expressed in a linear form. Unfortunately, there are cases that independence between statements of code ...
Read More
Speculative parallelization using software multi-threaded transactions
ASPLOS XV: Proceedings of the fifteenth International Conference on Architectural support for programming languages and operating systems

With the right techniques, multicore architectures may be able to continue the exponential performance trend that elevated the performance of applications of all types for decades. While many scientific programs can be parallelized without speculative ...
Read More
Speculative parallelization using software multi-threaded transactions
ASPLOS '10

With the right techniques, multicore architectures may be able to continue the exponential performance trend that elevated the performance of applications of all types for decades. While many scientific programs can be parallelized without speculative ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Architecture and Code Optimization Volume 9, Issue 4
Special Issue on High-Performance Embedded Architectures and Compilers
January 2013
876 pages
ISSN:1544-3566
EISSN:1544-3973
DOI:10.1145/2400682
Issue’s Table of Contents

Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 January 2013
- Accepted: 1 November 2012
- Revised: 1 September 2012
- Received: 1 June 2012
Published in taco Volume 9, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
SPECjvm2008
Thread-level speculation
inspector threads
loop-level parallelism
memory overhead
multicore processors
runtime parallelization
speculative parallelization
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 17
  Total Citations
  View Citations
- 708
  Total Downloads
- Downloads (Last 12 months)49
- Downloads (Last 6 weeks)10
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Optimizing software runtime systems for speculative parallelization

ACM Transactions on Architecture and Code Optimization

Abstract

References

Cited By

Index Terms

Recommendations

Compiler-Driven Software Speculation for Thread-Level Parallelism

Speculative parallelization using software multi-threaded transactions

Speculative parallelization using software multi-threaded transactions

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Optimizing software runtime systems for speculative parallelization

ACM Transactions on Architecture and Code Optimization

Abstract

References

Cited By

Index Terms

Recommendations

Compiler-Driven Software Speculation for Thread-Level Parallelism

Speculative parallelization using software multi-threaded transactions

Speculative parallelization using software multi-threaded transactions

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media