Abstract
The problem of writing software for multicore processors is greatly simplified if we could automatically parallelize sequential programs. Although auto-parallelization has been studied for many decades, it has succeeded only in a few application areas such as dense matrix computations. In particular, auto-parallelization of irregular programs, which are organized around large, pointer-based data structures like graphs, has seemed intractable.
The Galois project is taking a fresh look at autoparallelization. Rather than attempt to parallelize all programs no matter how obscurely they are written, we are designing programming abstractions that permit programmers to highlight opportunities for exploiting parallelism in sequential programs, and building a runtime system that uses these hints to execute the program in parallel. In this paper, we describe the design and implementation of a system based on these ideas. Experimental results for two real-world irregular applications, a Delaunay mesh refinement application and a graphics application that performs agglomerative clustering, demonstrate that this approach is promising.
- Burke, M., Carini, P., Choi, J.-D. Interprocedural Pointer Alias Analysis. Technical Report IBM RC 21055, IBM Yorktown Heights, 1997.Google Scholar
- Chew, L.P. Guaranteed-quality mesh generation for curved surfaces. In SCG'93: Proceedings of the 9th Annual Symposium on Computational Geometry (1993), 274--280. Google ScholarDigital Library
- de Galas, J. The quest for more processing power: is the single core CPU doomed? http://www.anandtech.com/cpuchipsets/showdoc.aspx?I=2377, February 2005.Google Scholar
- Diniz, P.C., Rinard, M.C. Commutativity analysis: a new analysis technique for parallelizing compilers. ACM Trans. Prog. Lang. Syst. 19, 6 (1997), 942--991. Google ScholarDigital Library
- Ghiya, R., Hendren, L. Is it a tree, a dag, or a cyclic graph? A shape analysis for heap-directed pointers c. In POPL, 1996. Google ScholarDigital Library
- Herlihy, M., Koskinen, E. Transactional boosting: a methodology for highlyconcurrent transactional objects. In Principles and Practices of Parallel Programming (PPoPP), 2008. Google ScholarDigital Library
- Herlihy, M., Moss, J.E.B. Transactional memory: architectural support for lock-free data structures. In ISCA '93: Proceedings of the 20th Annual International Symposium on Computer Architecture (1993). Google ScholarDigital Library
- Hudson, B., Miller, G.L., Phillips, T. Sparse parallel Delaunay mesh refinement. In SPAA (2007). Google ScholarDigital Library
- Jefferson, D.R. Virtual time. ACM Trans. Prog. Lang. Syst. 7, 3 (1985), 404--425. Google ScholarDigital Library
- Kennedy, K., Allen, J., editors. Optimizing Compilers for Modern Architectures: A Dependence-Based Approach. Morgan Kaufmann, 2001. Google ScholarDigital Library
- Kulkarni, M., Burtscher, M., Inkulu, R., Pingali, K., Cascaval, C. How much parallelism is there in irregular applications? In Principles and Practices of Parallel Programming (PPoPP), 2009. Google ScholarDigital Library
- Kulkarni, M., Carribault, P., Pingali, K., Ramanarayanan, G., Walter, B., Bala, K., Chew, L.P. Scheduling strategies for optimistic parallel execution of irregular programs. In Symposium on Parallel Architectures and Algorithms (SPAA) (2008). Google ScholarDigital Library
- Kulkarni M., Pingali, K., Ramanarayanan, G., Walter, B., Bala, K., Chew, L.P. Optimistic parallelism benefits from data partitioning. SIGARCH Comput. Archit. News 36, 1 (2008), 233--243. Google ScholarDigital Library
- Kulkarni, M., Pingali, K., Walter, B., Ramanarayanan, G., Bala, K., Chew, L.P. Optimistic parallelism requires abstractions. SIGPLAN Not (Proceedings of PLDI 2007) 42, 6 (2007), 211--222. Google ScholarDigital Library
- Larus, J., Rajwar, R. Transactional Memory (Synthesis Lectures on Computer Architecture). Morgan&Claypool Publishers, 2007. Google ScholarDigital Library
- Ni, Y., Menon, V., Adl-Tabatabai, A.-R., Hosking, A.L., Hudson, R., Moss, J.E.B., Saha, B., Shpeisman, T. Open nesting in software transactional memory. In Principles and Practices of Parallel Programming (PPoPP), 2007. Google ScholarDigital Library
- Pang-Ning Tan, M.S., Kumar, V., editors. Introduction to Data Mining. Pearson Addison Wesley, 2005. Google ScholarDigital Library
- Ponnusamy, R., Saltz, J., Choudhary, A. Runtime compilation techniques for data partitioning and communication schedule reuse. In Proceedings of the 1993 ACM/IEEE Conference on Supercomputing (1993). Google ScholarDigital Library
- Rauchwerger, L., Padua, D.A. The LRPD test: Speculative run-time parallelization of loops with privatization and reduction parallelization. IEEE Trans. Parallel Distrib. Syst. 10, 2 (1999), 160--180. Google ScholarDigital Library
- Sagiv, M., Reps, T., Wilhelm, R. Solving shape-analysis problems in languages with destructive updating. In Proceedings of the 23rd Annual ACM Symposium on the Principles of Programming Languages (st. Petersburg Beach, FL, January 1996). Google ScholarDigital Library
- Shewchuk, J.R. Triangle: Engineering a 2D quality mesh generator and Delaunay triangulator. In Applied Computational Geometry: Towards Geometric Engineering, volume 1148 of Lecture Notes in Computer Science. May 1996, 203--222. Google ScholarDigital Library
- Steffan, J.G., Colohan, C.B., Zhai, A., Mowry, T.C. A scalable approach to thread-level speculation. In ISCA '00: Proceedings of the 27th Annual International Symposium on Computer Architecture (2000). Google ScholarDigital Library
- Walter, B., Fernandez, S., Arbree, A., Bala, K., Donikian, M., Greenberg, D. Lightcuts: a scalable approach to illumination. ACM Trans. Graphics (SIGGRAPH) 24, 3 (July 2005), 1098--1107. Google ScholarDigital Library
- Zhan, L.R.Y., Torrellas, J. Hardware for speculative run-time parallelization in distributed shared-memory multiprocessors. In HPCA '98: Proceedings of the 4th International Symposium on High-Performance Computer Architecture (1998). Google ScholarDigital Library
Index Terms
- Optimistic parallelism requires abstractions
Recommendations
Optimistic parallelism requires abstractions
PLDI '07: Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and ImplementationIrregular applications, which manipulate large, pointer-based data structures like graphs, are difficult to parallelize manually. Automatic tools and techniques such as restructuring compilers and run-time speculative execution have failed to uncover ...
Optimistic parallelism requires abstractions
Proceedings of the 2007 PLDI conferenceIrregular applications, which manipulate large, pointer-based data structures like graphs, are difficult to parallelize manually. Automatic tools and techniques such as restructuring compilers and run-time speculative execution have failed to uncover ...
Performance of hybrid message-passing and shared-memory parallelism for discrete element modeling
SC '00: Proceedings of the 2000 ACM/IEEE conference on SupercomputingThe current trend in HPC hardware is towards clusters of shared-memory (SMP) compute nodes. For applications developers the major question is how best to program these SMP clusters. To address this we study an algorithm from Discrete Element Modeling, ...
Comments