2012 | OriginalPaper | Buchkapitel
A Practical Approach to DOACROSS Parallelization
verfasst von : Priya Unnikrishnan, Jun Shirako, Kit Barton, Sanjay Chatterjee, Raul Silvera, Vivek Sarkar
Erschienen in: Euro-Par 2012 Parallel Processing
Verlag: Springer Berlin Heidelberg
Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.
Wählen Sie Textabschnitte aus um mit Künstlicher Intelligenz passenden Patente zu finden. powered by
Markieren Sie Textabschnitte, um KI-gestützt weitere passende Inhalte zu finden. powered by
Loops with cross-iteration dependences (
doacross
loops) often contain significant amounts of parallelism that can potentially be exploited on modern manycore processors. However, most production-strength compilers focus their automatic parallelization efforts on
doall
loops, and consider
doacross
parallelism to be impractical due to the space inefficiencies and the synchronization overheads of past approaches. This paper presents a novel and
practical
approach to automatically parallelizing
doacross
loops for execution on manycore-SMP systems. We introduce a compiler-and-runtime optimization called
dependence folding
that bounds the number of synchronization variables allocated per worker thread (processor core) to be at most the maximum depth of a loop nest being considered for automatic parallelization. Our approach has been implemented in a development version of the IBM XL Fortran V13.1 commercial parallelizing compiler and runtime system. For four benchmarks where automatic
doall
parallelization was largely ineffective (speedups of under 2×), our implementation delivered speedups of 6.5×, 9.0×, 17.3×, and 17.5× on a 32-core IBM Power7 SMP system, thereby showing that
doacross
parallelization can be a valuable technique to complement
doall
parallelization.