Abstract
Speculative thread-level parallelization is a promising way to speed up codes that compilers fail to parallelize. While several speculative parallelization schemes have been proposed for different machine sizes and types of codes, the results so far show that it is hard to deliver scalable speedups. Often, the problem is not true dependence violations, but sub-optimal architectural design. Consequently, we attempt to identify and eliminate major architectural bottlenecks that limit the scalability of speculative parallelization. The solutions that we propose are: low-complexity commit in constant time to eliminate the task commit bottleneck, a memory-based overflow area to eliminate stall due to speculative buffer overflow, and exploiting high-level access patterns to minimize speculation-induced traffic. To show that the resulting system is truly scalable, we perform simulations with up to 128 processors. With our optimizations, the speedups for 128 and 64 processors reach 63 and 48, respectively. The average speedup for 64 processors is 32, nearly four times higher than without our optimizations.
- 1 H. Akkary and M. A. Driscoll. A Dynamic Multithreading Processor. In International Symposium on Microarchitecture, pages 226-236, December 1998. Google ScholarDigital Library
- 2 J. Barnes. ftp://hubble.ifa.hawaii.edu/pub/barnes/treecode/. University of Hawaii, 1994.Google Scholar
- 3 M. Berry et al. The Perfect Club Benchmarks: Effective Performance Evaluation of Supercomputers. International Journal of Supercompurer Applications, 3(3):5--40, Fall 1989.Google ScholarDigital Library
- 4 W. Blume, R. Doallo, R. Eigenmann, J. Grout, J. Hoeflinger, T. Lawrence, J, Lee, D. Padua, Y. Paek, B. Ponenger, L. Rauchwerger, and P. Tu. Advanced Program Restructuring for High-Performance Computers with Polaris. IEEE Computer, 29(12):78-82, December 1996. Google ScholarDigital Library
- 5 M. Cintra, J. E Martfnez, and J. Torrellas. Architectural Support for Scalable Speculative Parallelization in Shared-Memory Multiprocessors. In Proceedings of the 27th Annual International Symposium on Computer Architecture, pages 13-24, June 2000, Google ScholarDigital Library
- 6 1. Duff, R. Schreiber, and P. Havlak. HPF-2 Scope of Activities and Motivating Applications. Technical Report CRPC-TR94492, Rice University, November 1994.Google Scholar
- 7 S. Gopal, T. N. Vijaykumar, J. E. Smith, and G. S. Sohi. Speculative Versioning Cache. In Proceedings of the 4th International Symposium on High-PerJbrmance Computer Architecture, pages 195-205, February 1998. Google ScholarDigital Library
- 8 M. Gupta and R. Nim. Techniques for Speculative Run-Time Parallelization of Loops. In Proceedings of Supercomputing 1998, November 1998. Google ScholarDigital Library
- 9 L. Hammond, M. Willey, and K. Olukotun. Data Speculation Support for a Chip Multiprocessor. In 8th International Con}erence on Architectural Support fbr Programming Languages and Operating Systems, pages 58-69, October 1998. Google ScholarDigital Library
- 10 J.L. Henning. SPEC CPU2000: Measuring CPU Performance in the New Millenium. IEEE Computer, 33(7):28-35, July 2000. Google ScholarDigital Library
- 11 T. Knight. An Architecture for Mostly Functional Languages. In ACM Lisp and Functional Programming Con}erence, pages 500-519, August 1986. Google ScholarDigital Library
- 12 V. Krishnan and J. Torrellas. An Execution-Driven Framework for Fast and Accurate Simulation of Superscalar Processors. In International Conference on Parallel Architectures and Compilation Techniques, October 1998, Google ScholarDigital Library
- 13 V. Krishnan and J. Torrellas. A Chip-Multiprocessor Architecture with Speculative Multithreading. IEEE Trans. on Computers, Special Issue on Multithreaded Architectures, 48(9):866-880, September 1999. Google ScholarDigital Library
- 14 D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, and J. Hennessy. The Directory-Based Cache Coherence Protocol for the DASH Multiprocessor. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 148-159, May 1990. Google ScholarDigital Library
- 15 P. Mareuello and A. Gonzalez. Clustered Speculative Muhithreaded Processors. In Proceedings of the 1999 International Conference on Supercomputing, pages 365-372, June 1999. Google ScholarDigital Library
- 16 A. Nowatzyk, G, Aybay, M. Browne, E. Kelly, M. Parkin, B. Radke, and S. Vishin. The S3.mp Scalable Shared Memory Multiprocessor. In Proceedings of the 1995 International Conference on Parallel Processing, pages I1-I10, August 1995.Google Scholar
- 17 J. Oplinger, D. Heine, and M. S. Lam. In Search of Speculative Thread- Level Parallelism. In International Conference on Parallel Architectures and Compilation Techniques, October 1999. Google ScholarDigital Library
- 18 M. Prvulovic. Removing Architectural Bottlenecks to the Scalability of Speculative Parallelization. Masters Thesis, Computer Science Department, University of lllinois at Urbana-Champaign, November 2000.Google Scholar
- 19 L. Rauchwerger and D. Padua. The LRPD Test: Speculative Run- Time Parallelization of Loops with Privatization and Reduction Parallelization. In Proceedings of the SIGPLAN 1995 Conference on Programming Language Design and lmplementation, pages 218-232, June 1995. Google ScholarDigital Library
- 20 P. Rundberg and P. Stenstrom. Low-Cost Thread-Level Data Dependence Speculation on Multiprocessors. In Fourth Workshop on Multithreaded Execution, Architecture and Compilation, December 2000.Google Scholar
- 21 G. Sohi, S. Breach, and S. Vajapeyam. Multiscalar Processors. In Proceedings of the 22nd Annual International Symposium on Computer Architecture, pages 414-425, June 1995. Google ScholarDigital Library
- 22 J. G. Steffan, C. B. Colohan, A. Zhai, and T. C. Mowry. A Scalable Approach to Thread-Level Speculation. In Proceedings of the 27th Annual International Symposium on Computer Architecture, pages 1- 12, June 2000. Google ScholarDigital Library
- 23 M. Tremblay, MAJC: Microprocessor Architecture for Java Computing. Hot Chips, August 1999.Google Scholar
- 24 J. Y. Tsai, J. Huang, C. Amlo, D. Lilja, and P. C. Yew. The Superthreaded Processor Architecture. IEEE Trans. on Computers, Special Issue on Multithreaded Architectures, 48(9):881-902, September 1999. Google ScholarDigital Library
- 25 J. Veenstra and R. Fowler. MINT: A Front End for Efficient Simulation of Shared-Memory Multiprocessors. In Proceedings of the Second International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, pages 201-207, January 1994. Google ScholarDigital Library
- 26 Y. Zhang, L. Rauchwerger, and J. Torrellas. A Unified Approach to Speculative Parallelization of Loops in DSM Multipi-ocessors. Technical Report 1542, University of Illinois at Urbana-Champaign, Center for Supercomputing Research and Development, October 1998.Google Scholar
- 27 Y. Zhang, L. Rauchwerger, and J. Torrellas. Hardware for Speculative Run-Time Parallelization in Distributed Shared-Memory Multiprocessors. In Proceedings of the 4th International Symposium on High- Performance Computer Architecture, pages 162-174, February 1998. Google ScholarDigital Library
Index Terms
- Removing architectural bottlenecks to the scalability of speculative parallelization
Recommendations
Removing architectural bottlenecks to the scalability of speculative parallelization
ISCA '01: Proceedings of the 28th annual international symposium on Computer architectureSpeculative thread-level parallelization is a promising way to speed up codes that compilers fail to parallelize. While several speculative parallelization schemes have been proposed for different machine sizes and types of codes, the results so far ...
Bottlenecks in Multimedia Processing with SIMD Style Extensions and Architectural Enhancements
Multimedia SIMD extensions such as MMX and AltiVec speed up media processing; however, our characterization shows that the attributes of current general-purpose processors enhanced with SIMD extensions do not match very well with the access patterns and ...
Three Architectural Models for Compiler-Controlled Speculative Execution
To effectively exploit instruction level parallelism, the compiler must move instructions across branches. When an instruction is moved above a branch that it is control dependent on, it is considered to be speculatively executed since it is executed ...
Comments