skip to main content
10.1145/1250734.1250759acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
Article

Optimistic parallelism requires abstractions

Published:10 June 2007Publication History

ABSTRACT

Irregular applications, which manipulate large, pointer-based data structures like graphs, are difficult to parallelize manually. Automatic tools and techniques such as restructuring compilers and run-time speculative execution have failed to uncover much parallelism in these applications, in spite of a lot of effort by the research community. These difficulties have even led some researchers to wonder if there is any coarse-grain parallelism worth exploiting in irregular applications.

In this paper, we describe two real-world irregular applications: a Delaunay mesh refinement application and a graphics application thatperforms agglomerative clustering. By studying the algorithms and data structures used in theseapplications, we show that there is substantial coarse-grain, data parallelism in these applications, but that this parallelism is very dependent on the input data and therefore cannot be uncoveredby compiler analysis. In principle, optimistic techniques such asthread-level speculation can be used to uncover this parallelism, but we argue that current implementations cannot accomplish thisbecause they do not use the proper abstractions for the data structuresin these programs.

These insights have informed our design of the Galois system, an object-based optimistic parallelization system for irregular applications. There are three main aspects to Galois: (1) a small number of syntactic constructs for packaging optimistic parallelism as iteration over ordered and unordered sets, (2)assertions about methods in class libraries, and (3) a runtime scheme for detecting and recovering from potentially unsafe accesses to shared memory made by an optimistic computation.

We show that Delaunay mesh generation and agglomerative clustering can be parallelized in a straight-forward way using the Galois approach, and we present experimental measurements to show that this approach is practical. These results suggest that Galois is a practical approach to exploiting data parallelismin irregular programs.

References

  1. C. Scott Ananian, Krste Asanovic, Bradley C. Kuszmaul, Charles E. Leiserson, and Sean Lie. Unbounded transactional memory. In HPCA '05: Proceedings of the 11th International Symposium on High-Performance Computer Architecture, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Christos D. Antonopoulos, Xiaoning Ding, Andrey Chernikov, Filip Blagojevic, Dimitrios S. Nikolopoulos, and Nikos Chrisochoides. Multigrain parallel delaunay mesh generation: challenges and opportunities for multithreaded architectures. In ICS '05: Proceedings of the 19th annual international conference on Supercomputing, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. L. Bentley. Multidimensional binary search trees used for associative searching. Communications of the ACM, 18(9):509--517, 1975. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. Bernstein. Analysis of programs for parallel processing. IEEE Transactions on Electronic Computers, 1966.Google ScholarGoogle ScholarCross RefCross Ref
  5. Michael Burke, Paul Carini, and Jong-Deok Choi. Interprocedural pointer alias analysis. Technical Report IBM RC 21055, IBM Yorktown Heights, 1997.Google ScholarGoogle Scholar
  6. Brian D. Carlstrom, Austen McDonald, Christos Kozyrakis, and Kunle Olukotun. Transactional collection classes. In Principles and Practices of Parallel Programming (PPoPP), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C. C. Foster and E. M. Riseman. Percolation of code to enhance parallel dispatching and execution. IEEE Transactions on Computers, 21(12):1411--1415, 1972.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. L. Paul Chew. Guaranteed-quality mesh generation for curved surfaces. In SCG '93: Proceedings of the ninth annual symposium on Computational geometry, pages 274--280, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Johan de Galas. The quest for more processing power: is the single core CPU doomed? http://www.anandtech.com/cpuchipsets/ showdoc.aspx?i=2377, February 2005.Google ScholarGoogle Scholar
  10. Pedro C. Diniz and Martin C. Rinard. Commutativity analysis: a new analysis technique for parallelizing compilers. ACM Trans. Program. Lang. Syst., 19(6):942--991, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Joseph A. Fisher. Very long instruction word architectures and the eli-512. In ISCA '98: 25 years of the international symposia on Computer architecture (selected papers), 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Lance Hammond, Vicky Wong, Mike Chen, Brian D. Carlstrom, John D. Davis, Ben Hertzberg, Manohar K. Prabhu, Honggo Wijaya, Christos Kozyrakis, and Kunle Olukotun. Transactional memory coherence and consistency. ISCA 2004, 00:102, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  13. Tim Harris and Keir Fraser. Language support for lightweight transactions. In OOPSLA '03: Proceedings of the 18th annual ACM SIGPLAN conference on Object-oriented programing, systems, languages, and applications, pages 388--402, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. John Hennessy and David Patterson, editors. Computer Architecture: A Quantitative Approach. Morgan Kaufmann, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Maurice Herlihy and J. Eliot B. Moss. Transactional memory: architectural support for lock-free data structures. In ISCA '93: Proceedings of the 20th annual international symposium on Computer architecture, pages 289--300, New York, NY, USA, 1993. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Maurice P. Herlihy and William E. Weihl. Hybrid concurrency control for abstract data types. In PODS '88: Proceedings of the seventh ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, pages 201--210, New York, NY, USA, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. David R. Jefferson. Virtual time. ACM Trans. Program. Lang. Syst., 7(3):404--425, 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Guy L. Steele Jr. Making asynchronous parallelism safe for the world. In Proceedings of the 17th symposium on Principles of Programming Languages, pages 218--231, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. T. Schwartz, R. B. K. Dewar, E. Dubinsky, and E. Schonberg. Programming with sets: An introduction to SETL. Springer-Verlag Publishers, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Ken Kennedy and John Allen, editors. Optimizing compilers for modren architectures:a dependence-based approach. Morgan Kaufmann, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark D. Hill, and David A. Wood. Logtm: Log-based transactional memory. In HPCA '06: Proceedings of the 12th International Symposium on High Performance Computer Architecture, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  22. J. Eliot B. Moss and Antony L. Hosking. Nested transactional memory: Model and preliminary architectural sketches. In SCOOL '05: Sychronization and Concurrency in Object-Oriented Languages, 2005.Google ScholarGoogle Scholar
  23. J. B. C Neto, P. A. Wawrzynek, M. T. M. Carvalho, L. F. Martha, and A. R. Ingraffea. An algorithm for three-dimensional mesh generation for arbitrary regions with cracks. Engineering with Computers, 17:75--91, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  24. Yang Ni, Vijay Menon, Ali-Reza Adl-Tabatabai, Antony L. Hosking, Rick Hudson, J. Eliot B. Moss, Bratin Saha, and Tatiana Shpeisman. Open nesting in software transactional memory. In Principles and Practices of Parallel Programming (PPoPP), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Openmp: A proposed industry standard api for shared memory programming. See www.openmp.org, October 28, 1997.Google ScholarGoogle Scholar
  26. Michael Steinbach Pang-Ning Tan and Vipin Kumar, editors. Introduction to Data Mining. Pearson Addison Wesley, 2005.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. R. Ponnusamy, J. Saltz, and A. Choudhary. Runtime compilation techniques for data partitioning and communication schedule reuse. In Proceedings of the 1993 ACM/IEEE conference on Supercomputing, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Hany E. Ramadan, Donald E. Porter Christopher J. Rossbach, Owen S. Hofmann, Aditya Bhandari, and Emmett Witchel. Transactional memory designs for an operating system. In International Symposium on Computer Architecture (ISCA), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Lawrence Rauchwerger and David A. Padua. Parallelizing while loops for multiprocessor systems. In IPPS '95: Proceedings of the 9th International Symposium on Parallel Processing, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Lawrence Rauchwerger and David A. Padua. The lrpd test: Speculative run-time parallelization of loops with privatization and reduction parallelization. IEEE Trans. Parallel Distrib. Syst., 10(2):160--180, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. M. Sagiv, T. Reps, and R. Wilhelm. Solving shape-analysis problems in languages with destructive updating. In Proceedings of the 23rd Annual ACM Symposium on the Principles of Programming Languages, St. Petersburg Beach, FL, January 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. William Scherer and Michael Scott. Simple, fast, and practical non-blocking and blocking concurrent queue algorithms. In Proceedings of the Fifteenth ACM Symposium on Principles of Distributed Computing, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. B. Selman, H. Levesque, and D. Mitchell. A new method for solving hard satisfiability problems. In Proceedings of the Tenth National Conference on Artificial Intelligence, pages 440--446, 1992.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Nir Shavit and Dan Touitou. Software transactional memory. In PODC '95: Proceedings of the fourteenth annual ACM Symposium on Principles of Distributed Computing, pages 204--213, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Jonathan Richard Shewchuk. Triangle: Engineering a 2D Quality Mesh Generator and Delaunay Triangulator. In Applied Computational Geometry: Towards Geometric Engineering, volume 1148 of Lecture Notes in Computer Science, pages 203--222. May 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. J. Greggory Steffan, Christopher B. Colohan, Antonia Zhai, and Todd C. Mowry. A scalable approach to thread-level speculation. In ISCA '00: Proceedings of the 27th annual international symposium on Computer architecture, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Robert Tomasulo. An algorithm for exploiting multiple arithmetic units. IBM Journal, 11(1):25--33, 1967.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Christoph von Praun, Luis Ceze, and Calin Cascaval. Implicit parallelism with ordered transactions. In Principles and Practices of Parallel Programming (PPoPP), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Bruce Walter, Sebastian Fernandez, Adam Arbree, Kavita Bala, Michael Donikian, and Donald Greenberg. Lightcuts: a scalable approach to illumination. ACM Transactions on Graphics (SIGGRAPH), 24(3):1098--1107, July 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Commutativity-based concurrency control for abstract data types. IEEE Transactions on Computers, 37(12), 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Niklaus Wirth. Algorithms + Data Structures = Programs. Prentice Hall PTR, Upper Saddle River, NJ, USA, 1978. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Peng Wu and David A. Padua. Beyond arrays - a container-centric approach for parallelization of real-world symbolic applications. In LCPC '98: Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. L. Rauchwerger Y. Zhan and J. Torrellas. Hardware for speculative run--time parallelization in distributed shared-memory multiprocessors. In HPCA '98: Proceedings of the 4th International Symposium on High-Performance Computer Architecture, page 162, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Optimistic parallelism requires abstractions

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        PLDI '07: Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation
        June 2007
        508 pages
        ISBN:9781595936332
        DOI:10.1145/1250734
        • cover image ACM SIGPLAN Notices
          ACM SIGPLAN Notices  Volume 42, Issue 6
          Proceedings of the 2007 PLDI conference
          June 2007
          491 pages
          ISSN:0362-1340
          EISSN:1558-1160
          DOI:10.1145/1273442
          Issue’s Table of Contents

        Copyright © 2007 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 10 June 2007

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Acceptance Rates

        Overall Acceptance Rate406of2,067submissions,20%

        Upcoming Conference

        PLDI '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader