skip to main content
article

Removing architectural bottlenecks to the scalability of speculative parallelization

Published:01 May 2001Publication History
Skip Abstract Section

Abstract

Speculative thread-level parallelization is a promising way to speed up codes that compilers fail to parallelize. While several speculative parallelization schemes have been proposed for different machine sizes and types of codes, the results so far show that it is hard to deliver scalable speedups. Often, the problem is not true dependence violations, but sub-optimal architectural design. Consequently, we attempt to identify and eliminate major architectural bottlenecks that limit the scalability of speculative parallelization. The solutions that we propose are: low-complexity commit in constant time to eliminate the task commit bottleneck, a memory-based overflow area to eliminate stall due to speculative buffer overflow, and exploiting high-level access patterns to minimize speculation-induced traffic. To show that the resulting system is truly scalable, we perform simulations with up to 128 processors. With our optimizations, the speedups for 128 and 64 processors reach 63 and 48, respectively. The average speedup for 64 processors is 32, nearly four times higher than without our optimizations.

References

  1. 1 H. Akkary and M. A. Driscoll. A Dynamic Multithreading Processor. In International Symposium on Microarchitecture, pages 226-236, December 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. 2 J. Barnes. ftp://hubble.ifa.hawaii.edu/pub/barnes/treecode/. University of Hawaii, 1994.Google ScholarGoogle Scholar
  3. 3 M. Berry et al. The Perfect Club Benchmarks: Effective Performance Evaluation of Supercomputers. International Journal of Supercompurer Applications, 3(3):5--40, Fall 1989.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. 4 W. Blume, R. Doallo, R. Eigenmann, J. Grout, J. Hoeflinger, T. Lawrence, J, Lee, D. Padua, Y. Paek, B. Ponenger, L. Rauchwerger, and P. Tu. Advanced Program Restructuring for High-Performance Computers with Polaris. IEEE Computer, 29(12):78-82, December 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. 5 M. Cintra, J. E Martfnez, and J. Torrellas. Architectural Support for Scalable Speculative Parallelization in Shared-Memory Multiprocessors. In Proceedings of the 27th Annual International Symposium on Computer Architecture, pages 13-24, June 2000, Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. 6 1. Duff, R. Schreiber, and P. Havlak. HPF-2 Scope of Activities and Motivating Applications. Technical Report CRPC-TR94492, Rice University, November 1994.Google ScholarGoogle Scholar
  7. 7 S. Gopal, T. N. Vijaykumar, J. E. Smith, and G. S. Sohi. Speculative Versioning Cache. In Proceedings of the 4th International Symposium on High-PerJbrmance Computer Architecture, pages 195-205, February 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. 8 M. Gupta and R. Nim. Techniques for Speculative Run-Time Parallelization of Loops. In Proceedings of Supercomputing 1998, November 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. 9 L. Hammond, M. Willey, and K. Olukotun. Data Speculation Support for a Chip Multiprocessor. In 8th International Con}erence on Architectural Support fbr Programming Languages and Operating Systems, pages 58-69, October 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. 10 J.L. Henning. SPEC CPU2000: Measuring CPU Performance in the New Millenium. IEEE Computer, 33(7):28-35, July 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. 11 T. Knight. An Architecture for Mostly Functional Languages. In ACM Lisp and Functional Programming Con}erence, pages 500-519, August 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. 12 V. Krishnan and J. Torrellas. An Execution-Driven Framework for Fast and Accurate Simulation of Superscalar Processors. In International Conference on Parallel Architectures and Compilation Techniques, October 1998, Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. 13 V. Krishnan and J. Torrellas. A Chip-Multiprocessor Architecture with Speculative Multithreading. IEEE Trans. on Computers, Special Issue on Multithreaded Architectures, 48(9):866-880, September 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. 14 D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, and J. Hennessy. The Directory-Based Cache Coherence Protocol for the DASH Multiprocessor. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 148-159, May 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. 15 P. Mareuello and A. Gonzalez. Clustered Speculative Muhithreaded Processors. In Proceedings of the 1999 International Conference on Supercomputing, pages 365-372, June 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. 16 A. Nowatzyk, G, Aybay, M. Browne, E. Kelly, M. Parkin, B. Radke, and S. Vishin. The S3.mp Scalable Shared Memory Multiprocessor. In Proceedings of the 1995 International Conference on Parallel Processing, pages I1-I10, August 1995.Google ScholarGoogle Scholar
  17. 17 J. Oplinger, D. Heine, and M. S. Lam. In Search of Speculative Thread- Level Parallelism. In International Conference on Parallel Architectures and Compilation Techniques, October 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. 18 M. Prvulovic. Removing Architectural Bottlenecks to the Scalability of Speculative Parallelization. Masters Thesis, Computer Science Department, University of lllinois at Urbana-Champaign, November 2000.Google ScholarGoogle Scholar
  19. 19 L. Rauchwerger and D. Padua. The LRPD Test: Speculative Run- Time Parallelization of Loops with Privatization and Reduction Parallelization. In Proceedings of the SIGPLAN 1995 Conference on Programming Language Design and lmplementation, pages 218-232, June 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. 20 P. Rundberg and P. Stenstrom. Low-Cost Thread-Level Data Dependence Speculation on Multiprocessors. In Fourth Workshop on Multithreaded Execution, Architecture and Compilation, December 2000.Google ScholarGoogle Scholar
  21. 21 G. Sohi, S. Breach, and S. Vajapeyam. Multiscalar Processors. In Proceedings of the 22nd Annual International Symposium on Computer Architecture, pages 414-425, June 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. 22 J. G. Steffan, C. B. Colohan, A. Zhai, and T. C. Mowry. A Scalable Approach to Thread-Level Speculation. In Proceedings of the 27th Annual International Symposium on Computer Architecture, pages 1- 12, June 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. 23 M. Tremblay, MAJC: Microprocessor Architecture for Java Computing. Hot Chips, August 1999.Google ScholarGoogle Scholar
  24. 24 J. Y. Tsai, J. Huang, C. Amlo, D. Lilja, and P. C. Yew. The Superthreaded Processor Architecture. IEEE Trans. on Computers, Special Issue on Multithreaded Architectures, 48(9):881-902, September 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. 25 J. Veenstra and R. Fowler. MINT: A Front End for Efficient Simulation of Shared-Memory Multiprocessors. In Proceedings of the Second International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, pages 201-207, January 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. 26 Y. Zhang, L. Rauchwerger, and J. Torrellas. A Unified Approach to Speculative Parallelization of Loops in DSM Multipi-ocessors. Technical Report 1542, University of Illinois at Urbana-Champaign, Center for Supercomputing Research and Development, October 1998.Google ScholarGoogle Scholar
  27. 27 Y. Zhang, L. Rauchwerger, and J. Torrellas. Hardware for Speculative Run-Time Parallelization in Distributed Shared-Memory Multiprocessors. In Proceedings of the 4th International Symposium on High- Performance Computer Architecture, pages 162-174, February 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Removing architectural bottlenecks to the scalability of speculative parallelization

                Recommendations

                Comments

                Login options

                Check if you have access through your login credentials or your institution to get full access on this article.

                Sign in

                Full Access

                • Published in

                  cover image ACM SIGARCH Computer Architecture News
                  ACM SIGARCH Computer Architecture News  Volume 29, Issue 2
                  Special Issue: Proceedings of the 28th annual international symposium on Computer architecture (ISCA '01)
                  May 2001
                  262 pages
                  ISSN:0163-5964
                  DOI:10.1145/384285
                  Issue’s Table of Contents
                  • cover image ACM Conferences
                    ISCA '01: Proceedings of the 28th annual international symposium on Computer architecture
                    June 2001
                    289 pages
                    ISBN:0769511627
                    DOI:10.1145/379240

                  Copyright © 2001 Authors

                  Publisher

                  Association for Computing Machinery

                  New York, NY, United States

                  Publication History

                  • Published: 1 May 2001

                  Check for updates

                  Qualifiers

                  • article

                PDF Format

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader