article

Removing architectural bottlenecks to the scalability of speculative parallelization

Authors:
Milos Prvulovic

University of Illinois at Urbana-Champaign

University of Illinois at Urbana-Champaign
View Profile

,
María Jesús Garzarán

University of Illinois at Urbana-Champaign

University of Illinois at Urbana-Champaign
View Profile

,
Lawrence Rauchwerger

Texas A&M University

Texas A&M University
View Profile

,
Josep Torrellas

University of Illinois at Urbana-Champaign

University of Illinois at Urbana-Champaign
View Profile

Authors Info & Claims

ACM SIGARCH Computer Architecture News Volume 29 Issue 2May 2001pp 204–215https://doi.org/10.1145/384285.379264

Published:01 May 2001Publication History

ACM SIGARCH Computer Architecture News

Abstract

Speculative thread-level parallelization is a promising way to speed up codes that compilers fail to parallelize. While several speculative parallelization schemes have been proposed for different machine sizes and types of codes, the results so far show that it is hard to deliver scalable speedups. Often, the problem is not true dependence violations, but sub-optimal architectural design. Consequently, we attempt to identify and eliminate major architectural bottlenecks that limit the scalability of speculative parallelization. The solutions that we propose are: low-complexity commit in constant time to eliminate the task commit bottleneck, a memory-based overflow area to eliminate stall due to speculative buffer overflow, and exploiting high-level access patterns to minimize speculation-induced traffic. To show that the resulting system is truly scalable, we perform simulations with up to 128 processors. With our optimizations, the speedups for 128 and 64 processors reach 63 and 48, respectively. The average speedup for 64 processors is 32, nearly four times higher than without our optimizations.

References

1 H. Akkary and M. A. Driscoll. A Dynamic Multithreading Processor. In International Symposium on Microarchitecture, pages 226-236, December 1998. Google ScholarDigital Library
2 J. Barnes. ftp://hubble.ifa.hawaii.edu/pub/barnes/treecode/. University of Hawaii, 1994.Google Scholar
3 M. Berry et al. The Perfect Club Benchmarks: Effective Performance Evaluation of Supercomputers. International Journal of Supercompurer Applications, 3(3):5--40, Fall 1989.Google ScholarDigital Library
4 W. Blume, R. Doallo, R. Eigenmann, J. Grout, J. Hoeflinger, T. Lawrence, J, Lee, D. Padua, Y. Paek, B. Ponenger, L. Rauchwerger, and P. Tu. Advanced Program Restructuring for High-Performance Computers with Polaris. IEEE Computer, 29(12):78-82, December 1996. Google ScholarDigital Library
5 M. Cintra, J. E Martfnez, and J. Torrellas. Architectural Support for Scalable Speculative Parallelization in Shared-Memory Multiprocessors. In Proceedings of the 27th Annual International Symposium on Computer Architecture, pages 13-24, June 2000, Google ScholarDigital Library
6 1. Duff, R. Schreiber, and P. Havlak. HPF-2 Scope of Activities and Motivating Applications. Technical Report CRPC-TR94492, Rice University, November 1994.Google Scholar
7 S. Gopal, T. N. Vijaykumar, J. E. Smith, and G. S. Sohi. Speculative Versioning Cache. In Proceedings of the 4th International Symposium on High-PerJbrmance Computer Architecture, pages 195-205, February 1998. Google ScholarDigital Library
8 M. Gupta and R. Nim. Techniques for Speculative Run-Time Parallelization of Loops. In Proceedings of Supercomputing 1998, November 1998. Google ScholarDigital Library
9 L. Hammond, M. Willey, and K. Olukotun. Data Speculation Support for a Chip Multiprocessor. In 8th International Con}erence on Architectural Support fbr Programming Languages and Operating Systems, pages 58-69, October 1998. Google ScholarDigital Library
10 J.L. Henning. SPEC CPU2000: Measuring CPU Performance in the New Millenium. IEEE Computer, 33(7):28-35, July 2000. Google ScholarDigital Library
11 T. Knight. An Architecture for Mostly Functional Languages. In ACM Lisp and Functional Programming Con}erence, pages 500-519, August 1986. Google ScholarDigital Library
12 V. Krishnan and J. Torrellas. An Execution-Driven Framework for Fast and Accurate Simulation of Superscalar Processors. In International Conference on Parallel Architectures and Compilation Techniques, October 1998, Google ScholarDigital Library
13 V. Krishnan and J. Torrellas. A Chip-Multiprocessor Architecture with Speculative Multithreading. IEEE Trans. on Computers, Special Issue on Multithreaded Architectures, 48(9):866-880, September 1999. Google ScholarDigital Library
14 D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, and J. Hennessy. The Directory-Based Cache Coherence Protocol for the DASH Multiprocessor. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 148-159, May 1990. Google ScholarDigital Library
15 P. Mareuello and A. Gonzalez. Clustered Speculative Muhithreaded Processors. In Proceedings of the 1999 International Conference on Supercomputing, pages 365-372, June 1999. Google ScholarDigital Library
16 A. Nowatzyk, G, Aybay, M. Browne, E. Kelly, M. Parkin, B. Radke, and S. Vishin. The S3.mp Scalable Shared Memory Multiprocessor. In Proceedings of the 1995 International Conference on Parallel Processing, pages I1-I10, August 1995.Google Scholar
17 J. Oplinger, D. Heine, and M. S. Lam. In Search of Speculative Thread- Level Parallelism. In International Conference on Parallel Architectures and Compilation Techniques, October 1999. Google ScholarDigital Library
18 M. Prvulovic. Removing Architectural Bottlenecks to the Scalability of Speculative Parallelization. Masters Thesis, Computer Science Department, University of lllinois at Urbana-Champaign, November 2000.Google Scholar
19 L. Rauchwerger and D. Padua. The LRPD Test: Speculative Run- Time Parallelization of Loops with Privatization and Reduction Parallelization. In Proceedings of the SIGPLAN 1995 Conference on Programming Language Design and lmplementation, pages 218-232, June 1995. Google ScholarDigital Library
20 P. Rundberg and P. Stenstrom. Low-Cost Thread-Level Data Dependence Speculation on Multiprocessors. In Fourth Workshop on Multithreaded Execution, Architecture and Compilation, December 2000.Google Scholar
21 G. Sohi, S. Breach, and S. Vajapeyam. Multiscalar Processors. In Proceedings of the 22nd Annual International Symposium on Computer Architecture, pages 414-425, June 1995. Google ScholarDigital Library
22 J. G. Steffan, C. B. Colohan, A. Zhai, and T. C. Mowry. A Scalable Approach to Thread-Level Speculation. In Proceedings of the 27th Annual International Symposium on Computer Architecture, pages 1- 12, June 2000. Google ScholarDigital Library
23 M. Tremblay, MAJC: Microprocessor Architecture for Java Computing. Hot Chips, August 1999.Google Scholar
24 J. Y. Tsai, J. Huang, C. Amlo, D. Lilja, and P. C. Yew. The Superthreaded Processor Architecture. IEEE Trans. on Computers, Special Issue on Multithreaded Architectures, 48(9):881-902, September 1999. Google ScholarDigital Library
25 J. Veenstra and R. Fowler. MINT: A Front End for Efficient Simulation of Shared-Memory Multiprocessors. In Proceedings of the Second International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, pages 201-207, January 1994. Google ScholarDigital Library
26 Y. Zhang, L. Rauchwerger, and J. Torrellas. A Unified Approach to Speculative Parallelization of Loops in DSM Multipi-ocessors. Technical Report 1542, University of Illinois at Urbana-Champaign, Center for Supercomputing Research and Development, October 1998.Google Scholar
27 Y. Zhang, L. Rauchwerger, and J. Torrellas. Hardware for Speculative Run-Time Parallelization in Distributed Shared-Memory Multiprocessors. In Proceedings of the 4th International Symposium on High- Performance Computer Architecture, pages 162-174, February 1998. Google ScholarDigital Library

Index Terms

Recommendations

Removing architectural bottlenecks to the scalability of speculative parallelization
ISCA '01: Proceedings of the 28th annual international symposium on Computer architecture

Speculative thread-level parallelization is a promising way to speed up codes that compilers fail to parallelize. While several speculative parallelization schemes have been proposed for different machine sizes and types of codes, the results so far ...
Read More
Bottlenecks in Multimedia Processing with SIMD Style Extensions and Architectural Enhancements

Multimedia SIMD extensions such as MMX and AltiVec speed up media processing; however, our characterization shows that the attributes of current general-purpose processors enhanced with SIMD extensions do not match very well with the access patterns and ...
Read More
Three Architectural Models for Compiler-Controlled Speculative Execution

To effectively exploit instruction level parallelism, the compiler must move instructions across branches. When an instruction is moved above a branch that it is control dependent on, it is considered to be speculatively executed since it is executed ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGARCH Computer Architecture News Volume 29, Issue 2
Special Issue: Proceedings of the 28th annual international symposium on Computer architecture (ISCA '01)
May 2001
262 pages
ISSN:0163-5964
DOI:10.1145/384285
Editor:
Per Stenström
Chalmers Univ. of Technology
Issue’s Table of Contents
ISCA '01: Proceedings of the 28th annual international symposium on Computer architecture
June 2001
289 pages
ISBN:0769511627
DOI:10.1145/379240
Chairman:
Per Stenström
Chalmers Univ. of Technology
Copyright © 2001 Authors
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 May 2001
Check for updates
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 72
  Total Citations
  View Citations
- 541
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Removing architectural bottlenecks to the scalability of speculative parallelization

ACM SIGARCH Computer Architecture News

Abstract

References

Cited By

Index Terms

Recommendations

Removing architectural bottlenecks to the scalability of speculative parallelization

Bottlenecks in Multimedia Processing with SIMD Style Extensions and Architectural Enhancements

Three Architectural Models for Compiler-Controlled Speculative Execution