Article

Optimistic parallelism requires abstractions

Authors:
Milind Kulkarni

University of Texas, Austin, TX

University of Texas, Austin, TX
View Profile

,
Keshav Pingali

University of Texas, Austin, TX

University of Texas, Austin, TX
View Profile

,
Bruce Walter

Cornell University, Ithaca, TX

Cornell University, Ithaca, TX
View Profile

,
Ganesh Ramanarayanan

Cornell University, Ithaca, TX

Cornell University, Ithaca, TX
View Profile

,
Kavita Bala

Cornell University, Ithaca, TX

Cornell University, Ithaca, TX
View Profile

,
L. Paul Chew

Cornell University, Ithaca, TX

Cornell University, Ithaca, TX
View Profile

PLDI '07: Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and ImplementationJune 2007Pages 211–222https://doi.org/10.1145/1250734.1250759

Published:10 June 2007Publication History

PLDI '07: Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation

Pages 211–222

ABSTRACT

Irregular applications, which manipulate large, pointer-based data structures like graphs, are difficult to parallelize manually. Automatic tools and techniques such as restructuring compilers and run-time speculative execution have failed to uncover much parallelism in these applications, in spite of a lot of effort by the research community. These difficulties have even led some researchers to wonder if there is any coarse-grain parallelism worth exploiting in irregular applications.

In this paper, we describe two real-world irregular applications: a Delaunay mesh refinement application and a graphics application thatperforms agglomerative clustering. By studying the algorithms and data structures used in theseapplications, we show that there is substantial coarse-grain, data parallelism in these applications, but that this parallelism is very dependent on the input data and therefore cannot be uncoveredby compiler analysis. In principle, optimistic techniques such asthread-level speculation can be used to uncover this parallelism, but we argue that current implementations cannot accomplish thisbecause they do not use the proper abstractions for the data structuresin these programs.

These insights have informed our design of the Galois system, an object-based optimistic parallelization system for irregular applications. There are three main aspects to Galois: (1) a small number of syntactic constructs for packaging optimistic parallelism as iteration over ordered and unordered sets, (2)assertions about methods in class libraries, and (3) a runtime scheme for detecting and recovering from potentially unsafe accesses to shared memory made by an optimistic computation.

We show that Delaunay mesh generation and agglomerative clustering can be parallelized in a straight-forward way using the Galois approach, and we present experimental measurements to show that this approach is practical. These results suggest that Galois is a practical approach to exploiting data parallelismin irregular programs.

References

C. Scott Ananian, Krste Asanovic, Bradley C. Kuszmaul, Charles E. Leiserson, and Sean Lie. Unbounded transactional memory. In HPCA '05: Proceedings of the 11th International Symposium on High-Performance Computer Architecture, 2005. Google ScholarDigital Library
Christos D. Antonopoulos, Xiaoning Ding, Andrey Chernikov, Filip Blagojevic, Dimitrios S. Nikolopoulos, and Nikos Chrisochoides. Multigrain parallel delaunay mesh generation: challenges and opportunities for multithreaded architectures. In ICS '05: Proceedings of the 19th annual international conference on Supercomputing, 2005. Google ScholarDigital Library
J. L. Bentley. Multidimensional binary search trees used for associative searching. Communications of the ACM, 18(9):509--517, 1975. Google ScholarDigital Library
A. Bernstein. Analysis of programs for parallel processing. IEEE Transactions on Electronic Computers, 1966.Google ScholarCross Ref
Michael Burke, Paul Carini, and Jong-Deok Choi. Interprocedural pointer alias analysis. Technical Report IBM RC 21055, IBM Yorktown Heights, 1997.Google Scholar
Brian D. Carlstrom, Austen McDonald, Christos Kozyrakis, and Kunle Olukotun. Transactional collection classes. In Principles and Practices of Parallel Programming (PPoPP), 2007. Google ScholarDigital Library
C. C. Foster and E. M. Riseman. Percolation of code to enhance parallel dispatching and execution. IEEE Transactions on Computers, 21(12):1411--1415, 1972.Google ScholarDigital Library
L. Paul Chew. Guaranteed-quality mesh generation for curved surfaces. In SCG '93: Proceedings of the ninth annual symposium on Computational geometry, pages 274--280, 1993. Google ScholarDigital Library
Johan de Galas. The quest for more processing power: is the single core CPU doomed? http://www.anandtech.com/cpuchipsets/ showdoc.aspx?i=2377, February 2005.Google Scholar
Pedro C. Diniz and Martin C. Rinard. Commutativity analysis: a new analysis technique for parallelizing compilers. ACM Trans. Program. Lang. Syst., 19(6):942--991, 1997. Google ScholarDigital Library
Joseph A. Fisher. Very long instruction word architectures and the eli-512. In ISCA '98: 25 years of the international symposia on Computer architecture (selected papers), 1998. Google ScholarDigital Library
Lance Hammond, Vicky Wong, Mike Chen, Brian D. Carlstrom, John D. Davis, Ben Hertzberg, Manohar K. Prabhu, Honggo Wijaya, Christos Kozyrakis, and Kunle Olukotun. Transactional memory coherence and consistency. ISCA 2004, 00:102, 2004.Google ScholarCross Ref
Tim Harris and Keir Fraser. Language support for lightweight transactions. In OOPSLA '03: Proceedings of the 18th annual ACM SIGPLAN conference on Object-oriented programing, systems, languages, and applications, pages 388--402, 2003. Google ScholarDigital Library
John Hennessy and David Patterson, editors. Computer Architecture: A Quantitative Approach. Morgan Kaufmann, 2003. Google ScholarDigital Library
Maurice Herlihy and J. Eliot B. Moss. Transactional memory: architectural support for lock-free data structures. In ISCA '93: Proceedings of the 20th annual international symposium on Computer architecture, pages 289--300, New York, NY, USA, 1993. ACM Press. Google ScholarDigital Library
Maurice P. Herlihy and William E. Weihl. Hybrid concurrency control for abstract data types. In PODS '88: Proceedings of the seventh ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, pages 201--210, New York, NY, USA, 1988. Google ScholarDigital Library
David R. Jefferson. Virtual time. ACM Trans. Program. Lang. Syst., 7(3):404--425, 1985. Google ScholarDigital Library
Guy L. Steele Jr. Making asynchronous parallelism safe for the world. In Proceedings of the 17th symposium on Principles of Programming Languages, pages 218--231, 1990. Google ScholarDigital Library
J. T. Schwartz, R. B. K. Dewar, E. Dubinsky, and E. Schonberg. Programming with sets: An introduction to SETL. Springer-Verlag Publishers, 1986. Google ScholarDigital Library
Ken Kennedy and John Allen, editors. Optimizing compilers for modren architectures:a dependence-based approach. Morgan Kaufmann, 2001. Google ScholarDigital Library
Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark D. Hill, and David A. Wood. Logtm: Log-based transactional memory. In HPCA '06: Proceedings of the 12th International Symposium on High Performance Computer Architecture, 2006.Google ScholarCross Ref
J. Eliot B. Moss and Antony L. Hosking. Nested transactional memory: Model and preliminary architectural sketches. In SCOOL '05: Sychronization and Concurrency in Object-Oriented Languages, 2005.Google Scholar
J. B. C Neto, P. A. Wawrzynek, M. T. M. Carvalho, L. F. Martha, and A. R. Ingraffea. An algorithm for three-dimensional mesh generation for arbitrary regions with cracks. Engineering with Computers, 17:75--91, 2001.Google ScholarCross Ref
Yang Ni, Vijay Menon, Ali-Reza Adl-Tabatabai, Antony L. Hosking, Rick Hudson, J. Eliot B. Moss, Bratin Saha, and Tatiana Shpeisman. Open nesting in software transactional memory. In Principles and Practices of Parallel Programming (PPoPP), 2007. Google ScholarDigital Library
Openmp: A proposed industry standard api for shared memory programming. See www.openmp.org, October 28, 1997.Google Scholar
Michael Steinbach Pang-Ning Tan and Vipin Kumar, editors. Introduction to Data Mining. Pearson Addison Wesley, 2005.Google ScholarDigital Library
R. Ponnusamy, J. Saltz, and A. Choudhary. Runtime compilation techniques for data partitioning and communication schedule reuse. In Proceedings of the 1993 ACM/IEEE conference on Supercomputing, 1993. Google ScholarDigital Library
Hany E. Ramadan, Donald E. Porter Christopher J. Rossbach, Owen S. Hofmann, Aditya Bhandari, and Emmett Witchel. Transactional memory designs for an operating system. In International Symposium on Computer Architecture (ISCA), 2007. Google ScholarDigital Library
Lawrence Rauchwerger and David A. Padua. Parallelizing while loops for multiprocessor systems. In IPPS '95: Proceedings of the 9th International Symposium on Parallel Processing, 1995. Google ScholarDigital Library
Lawrence Rauchwerger and David A. Padua. The lrpd test: Speculative run-time parallelization of loops with privatization and reduction parallelization. IEEE Trans. Parallel Distrib. Syst., 10(2):160--180, 1999. Google ScholarDigital Library
M. Sagiv, T. Reps, and R. Wilhelm. Solving shape-analysis problems in languages with destructive updating. In Proceedings of the 23rd Annual ACM Symposium on the Principles of Programming Languages, St. Petersburg Beach, FL, January 1996. Google ScholarDigital Library
William Scherer and Michael Scott. Simple, fast, and practical non-blocking and blocking concurrent queue algorithms. In Proceedings of the Fifteenth ACM Symposium on Principles of Distributed Computing, 1996. Google ScholarDigital Library
B. Selman, H. Levesque, and D. Mitchell. A new method for solving hard satisfiability problems. In Proceedings of the Tenth National Conference on Artificial Intelligence, pages 440--446, 1992.Google ScholarDigital Library
Nir Shavit and Dan Touitou. Software transactional memory. In PODC '95: Proceedings of the fourteenth annual ACM Symposium on Principles of Distributed Computing, pages 204--213, 1995. Google ScholarDigital Library
Jonathan Richard Shewchuk. Triangle: Engineering a 2D Quality Mesh Generator and Delaunay Triangulator. In Applied Computational Geometry: Towards Geometric Engineering, volume 1148 of Lecture Notes in Computer Science, pages 203--222. May 1996. Google ScholarDigital Library
J. Greggory Steffan, Christopher B. Colohan, Antonia Zhai, and Todd C. Mowry. A scalable approach to thread-level speculation. In ISCA '00: Proceedings of the 27th annual international symposium on Computer architecture, 2000. Google ScholarDigital Library
Robert Tomasulo. An algorithm for exploiting multiple arithmetic units. IBM Journal, 11(1):25--33, 1967.Google ScholarDigital Library
Christoph von Praun, Luis Ceze, and Calin Cascaval. Implicit parallelism with ordered transactions. In Principles and Practices of Parallel Programming (PPoPP), 2007. Google ScholarDigital Library
Bruce Walter, Sebastian Fernandez, Adam Arbree, Kavita Bala, Michael Donikian, and Donald Greenberg. Lightcuts: a scalable approach to illumination. ACM Transactions on Graphics (SIGGRAPH), 24(3):1098--1107, July 2005. Google ScholarDigital Library
Commutativity-based concurrency control for abstract data types. IEEE Transactions on Computers, 37(12), 1988. Google ScholarDigital Library
Niklaus Wirth. Algorithms + Data Structures = Programs. Prentice Hall PTR, Upper Saddle River, NJ, USA, 1978. Google ScholarDigital Library
Peng Wu and David A. Padua. Beyond arrays - a container-centric approach for parallelization of real-world symbolic applications. In LCPC '98: Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing, 1999. Google ScholarDigital Library
L. Rauchwerger Y. Zhan and J. Torrellas. Hardware for speculative run--time parallelization in distributed shared-memory multiprocessors. In HPCA '98: Proceedings of the 4th International Symposium on High-Performance Computer Architecture, page 162, 1998. Google ScholarDigital Library

Index Terms

Optimistic parallelism requires abstractions
1. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel programming languages
2. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language types
        Parallel programming languages

Recommendations

How much parallelism is there in irregular applications?
PPoPP '09

Irregular programs are programs organized around pointer-based data structures such as trees and graphs. Recent investigations by the Galois project have shown that many irregular programs have a generalized form of data-parallelism called amorphous ...
Read More
Optimistic parallelism requires abstractions
Proceedings of the 2007 PLDI conference

Irregular applications, which manipulate large, pointer-based data structures like graphs, are difficult to parallelize manually. Automatic tools and techniques such as restructuring compilers and run-time speculative execution have failed to uncover ...
Read More
Optimistic parallelism benefits from data partitioning
ASPLOS '08

Recent studies of irregular applications such as finite-element mesh generators and data-clustering codes have shown that these applications have a generalized data parallelism arising from the use of iterative algorithms that perform computations on ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
PLDI '07: Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation
June 2007
508 pages
ISBN:9781595936332
DOI:10.1145/1250734
General Chair:
Jeanne Ferrante
University of California, San Diego, USA
,
Program Chair:
Kathryn S. McKinley
University of Texas at Austin, USA
ACM SIGPLAN Notices Volume 42, Issue 6
Proceedings of the 2007 PLDI conference
June 2007
491 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/1273442
Issue’s Table of Contents
Copyright © 2007 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 10 June 2007
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
abstractions
irregular programs
optimistic parallelism
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate406of2,067submissions,20%
Upcoming Conference
PLDI '24

Sponsor:

sigplan

ACM SIGPLAN Conference on Programming Language Design and Implementation

June 24 - 28, 2024

Copenhagen , Denmark
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 403
  Total Citations
  View Citations
- 2,015
  Total Downloads
- Downloads (Last 12 months)53
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Optimistic parallelism requires abstractions

PLDI '07: Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation

ABSTRACT

References

Cited By

Index Terms

Recommendations

How much parallelism is there in irregular applications?

Optimistic parallelism requires abstractions

Optimistic parallelism benefits from data partitioning