research-article

Low depth cache-oblivious algorithms

Authors:
Guy E. Blelloch

Carnegie Mellon University, Pittsburgh, USA

Carnegie Mellon University, Pittsburgh, USA
View Profile

,
Phillip B. Gibbons

Intel Research Pittsburgh, Pittsburgh, USA

Intel Research Pittsburgh, Pittsburgh, USA
View Profile

,
Harsha Vardhan Simhadri

Carnegie Mellon University, Pittsuburgh, USA

Carnegie Mellon University, Pittsuburgh, USA
View Profile

SPAA '10: Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architecturesJune 2010Pages 189–199https://doi.org/10.1145/1810479.1810519

Published:13 June 2010Publication History

SPAA '10: Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures

Pages 189–199

ABSTRACT

In this paper we explore a simple and general approach for developing parallel algorithms that lead to good cache complexity on parallel machines with private or shared caches. The approach is to design nested-parallel algorithms that have low depth (span, critical path length) and for which the natural sequential evaluation order has low cache complexity in the cache-oblivious model. We describe several cache-oblivious algorithms with optimal work, polylogarithmic depth, and sequential cache complexities that match the best sequential algorithms, including the first such algorithms for sorting and for sparse-matrix vector multiply on matrices with good vertex separators.

Using known mappings, our results lead to low cache complexities on shared-memory multiprocessors with a single level of private caches or a single shared cache. We generalize these mappings to multi-level cache hierarchies of private or shared caches, implying that our algorithms also have low cache complexities on such hierarchies. The key factor in obtaining these low parallel cache complexities is the low depth of the algorithms we propose.

References

U. A. Acar, G. E. Blelloch, and R. D. Blumofe. The data locality of work stealing. Theory of Computing Systems, 35(3), 2002.Google Scholar
A. Aggarwal, B. Alpern, A. Chandra, and M. Snir. A model for hierarchical memory. In ACM STOC'87, 1987. Google ScholarDigital Library
B. Alpern, L. Carter, E. Feig, and T. Selker. The uniform memory hierarchy model of computation. Algorithmica, 12, 1994.Google Scholar
B. Alpern, L. Carter, and J. Ferrante. Modeling parallel computers as memory hierarchies. In Proc. 1993 Conf. on Programming Models for Massively Parallel Computers, 1993.Google ScholarCross Ref
L. Arge, M. A. Bender, E. D. Demaine, B. Holland-Minkley, and J. I. Munro. Cache-oblivious priority queue and graph algorithm applications. In ACM STOC'02, 2002. Google ScholarDigital Library
L. Arge, G. S. Brodal, and R. Fagerberg. Cache-oblivous data structures. In D. Mehta and S. Sahni, editors, Handbook of Data Structures and Applications. CRC Press, 2005.Google Scholar
L. Arge, M. T. Goodrich, M. Nelson, and N. Sitchinava. Fundamental parallel algorithms for private-cache chip multiprocessors. In ACM SPAA'08, 2008. Google ScholarDigital Library
L. Arge, M. T. Goodrich, and N. Sitchinava. Parallel external memory graph algorithms. Manuscript, 2009.Google Scholar
N. S. Arora, R. D. Blumofe, and C. G. Plaxton. Thread scheduling for multiprogrammed multiprocessors. In ACM SPAA'98, 1998. Google ScholarDigital Library
M. A. Bender, G. S. Brodal, R. Fagerberg, R. Jacob, and E. Vicari. Optimal sparse matrix dense vector multiplication in the I/O-model. In ACM SPAA'07, 2007. Google ScholarDigital Library
M. A. Bender, B. C. Kuszmaul, S.-H. Teng, and K. Wang. Optimal cache-oblivious mesh layout. Computing Research Repository (CoRR) abs/0705.1033, 2007.Google Scholar
G. Bilardi. Models for parallel and hierarchical computation. In Proc. 4th ACM International Conf. on Computing Frontiers, 2007. Google ScholarDigital Library
G. E. Blelloch. Programming parallel algorithms. Commun. ACM, 39(3), 1996. Google ScholarDigital Library
G. E. Blelloch, R. A. Chowdhury, P. B. Gibbons, V. Ramachandran, S. Chen, and M. Kozuch. Provably good multicore cache performance for divide-and-conquer algorithms. In ACM-SIAM SODA'08, 2008. Google ScholarDigital Library
G. E. Blelloch and P. B. Gibbons. Effectively sharing a cache among threads. In ACM SPAA'04, 2004. Google ScholarDigital Library
G. E. Blelloch, P. B. Gibbons, and Y. Matias. Provably efficient scheduling for languages with fine-grained parallelism. Journal of the ACM, 46(2), 1999. Google ScholarDigital Library
G. E. Blelloch, P. B. Gibbons, and H. V. Simhadri. Brief announcement: Low-depth cache oblivious sorting. In ACM SPAA'09, 2009. Google ScholarDigital Library
G. E. Blelloch, P. B. Gibbons, and H. V. Simhadri. Low-depth cache oblivious algorithms. Technical Report CMU-CS-TR-134, Computer Science Department, Carnegie Mellon University, 2009 http://reports-archive.adm.cs.cmu.edu/anon/2009/CMU-CS-09-134.pdf.Google Scholar
R. D. Blumofe, M. Frigo, C. F. Joerg, C. E. Leiserson, and K. H. Randall. Dag-consistent distributed shared memory. In IPPS'96, 1996. Google ScholarDigital Library
R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. Cilk: An efficient multithreaded runtime system. J. Parallel Distrib. Comput., 37(1), 1996. Google ScholarDigital Library
R. D. Blumofe and C. E. Leiserson. Scheduling multithreaded computations by work stealing. Journal of the ACM, 46(5), 1999. Google ScholarDigital Library
G. S. Brodal. Cache-oblivious algorithms and data structures. In Proc. 9th Scandinavian Workshop on Algorithm Theory, 2004. LNCS, vol. 3111. Springer.Google ScholarCross Ref
G. S. Brodal and R. Fagerberg. Cache oblivious distribution sweeping. In ICALP '02, 2002. LNCS, vol. 2380. Springer. Google ScholarDigital Library
G. S. Brodal, R. Fagerberg, and G. Moruz. Cache-aware and cache-oblivious adaptive sorting. In ICALP'05, 2005. LNCS, vol. 3580. Springer. Google ScholarDigital Library
G. S. Brodal, R. Fagerberg, and K. Vinther. Engineering a cache-oblivious sorting algorithm. ACM Journal of Experimental Algorithmics, 12, 2008. Google ScholarDigital Library
P. Charles, C. Donawa, K. Ebcioglu, C. Grothoff, A. Kielstra, C. von Praun, V. Saraswat, and V. Sarkar. X10: An object-oriented approach to non-uniform clustered computing. In Proc. ACM SIGPLAN Conf. on Object-Oriented Programming Languages and Applications, 2005. Google ScholarDigital Library
Y.-J. Chiang, M. T. Goodrich, E. F. Grove, R. Tamassia, D. E. Vengroff, and J. S. Vitter. External-memory graph algorithms. In ACM-SIAM SODA'95, 1995. Google ScholarDigital Library
R. A. Chowdhury and V. Ramachandran. The cache-oblivious gaussian elimination paradigm: theoretical framework, parallelization and experimental evaluation. In ACM SPAA'07, 2007. Google ScholarDigital Library
R. A. Chowdhury and V. Ramachandran. Cache-efficient dynamic programming algorithms for multicores. In ACM SPAA'08, 2008. Google ScholarDigital Library
R. A. Chowdhury, V. Ramachandran, and F. Silvestri. Oblivious algorithms for multicore, network, and petascale computing. Manuscript, 2009.Google Scholar
R. A. Chowdhury, F. Silvestri, B. Blakeley, and V. Ramachandran. Oblivious algorithms for multicores and network of processors. In IEEE IPDPS'10, 2010.Google ScholarCross Ref
R. Cole and U. Vishkin. Deterministic coin tossing and accelerating cascades: micro and macro techniques for designing parallel algorithms. In ACM STOC'86, 1986. Google ScholarDigital Library
D. E. Culler, R. M. Karp, D. A. Patterson, A. Sahay, K. E. Schauser, E. E. Santos, R. Subramonian, and T. von Eicken. Logp: Towards a realistic model of parallel computation. In ACM PPOPP'93, 1993. Google ScholarDigital Library
E. D. Demaine. Cache-oblivious algorithms and data structures. In Lecture Notes from the EEF Summer School on Massive Data Sets, LNCS. Springer-Verlag, 2002.Google Scholar
K. Fatahalian, T. J. Knight, M. Houston, M. Erez, D. R. Horn, L. Leem, J. Y. Park, M. Ren, A. Aiken, W. J. Dally, and P. Hanrahan. Sequoia: Programming the memory hierarchy. In Supercomputing'06, 2006. Google ScholarDigital Library
G. Franceschini. Proximity mergesort: Optimal in-place sorting in the cache-oblivious model. In ACM-SIAM SODA'04, 2004. Google ScholarDigital Library
W. D. Frazer and A. C. McKellar. Samplesort: A sampling approach to minimal storage tree sorting. Journal of the ACM, 17(3), 1970. Google ScholarDigital Library
M. Frigo, C. E. Leiserson, H. Prokop, and S. Ramachandran. Cache-oblivious algorithms. In IEEE FOCS'99, 1999. Google ScholarDigital Library
M. Frigo and V. Strumpen. The cache complexity of multithreaded cache oblivious algorithms. In ACM SPAA'06, 2006. Google ScholarDigital Library
J. Jaja. An Introduction to Parallel Algorithms. Addison-Wesley, 1992. Google ScholarDigital Library
P. Kumar. Cache oblivious algorithms. In U. Meyer, P. Sanders, and J. Sibeyn, editors, Algorithms for Memory Hierarchies. Springer, 2003.Google ScholarCross Ref
E. Ladan-Mozes and C. E. Leiserson. A consistency architecture for hierarchical shared caches. In ACM SPAA'08, 2008. Google ScholarDigital Library
R. J. Lipton and R. E. Tarjan. A separator theorem for planar graphs. SIAM Journal on Applied Mathematics, 36, 1979.Google Scholar
OpenMP Architecture Review Board. OpenMP application program interface. Technical Report Version 3.0, 2008.Google Scholar
S. Rajasekaran and J. H. Reif. Optimal and sublogarithmic time randomized parallel sorting algorithms. SIAM J. Comput., 18(3), 1989. Google ScholarDigital Library
J. E. Savage and M. Zubair. A unified model for multicore architectures. In Proc. 1st International Forum on Next-Generation Multicore/Manycore Technologies, 2008. Google ScholarDigital Library
D. D. Sleator and R. E. Tarjan. Amortized efficiency of list update and paging rules. Commun. ACM, 28(2), 1985. Google ScholarDigital Library
L. G. Valiant. A bridging model for parallel computation. Commun. ACM, 33(8), 1990. Google ScholarDigital Library
L. G. Valiant. A bridging model for multicore computing. In ESA, 2008. Google ScholarDigital Library

Index Terms

Recommendations

Cache-Oblivious Algorithms

This article presents asymptotically optimal algorithms for rectangular matrix transpose, fast Fourier transform (FFT), and sorting on computers with multiple levels of caching. Unlike previous optimal algorithms, these algorithms are cache oblivious: ...
Read More
Brief announcement: low depth cache-oblivious sorting
SPAA '09: Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures

Cache-oblivious algorithms have the advantage of achieving good sequential cache complexity across all levels of a multi-level cache hierarchy, regardless of the specifics (cache size and cache line size) of each level. In this paper, we describe cache-...
Read More
Effects of Cache Coherency in Multiprocessors

In many commercial multiprocessor systems, each processor accesses the memory through a private cache. One problem that could limit the extensibility of the system and its performance is the enforcement of cache coherence. A mechanism must exist which ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SPAA '10: Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
June 2010
378 pages
ISBN:9781450300797
DOI:10.1145/1810479
General Chairs:
Friedhelm Meyer auf der Heide
University of Paderborn, Germany
,
Cynthia Phillips
Sandia National Laboratories, USA
Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 June 2010
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
cache-oblivious algorithms
graph algorithms
multiprocessors
parallel algorithms
schedulers
sorting
sparse-matrix vector multiply
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate447of1,461submissions,31%
Upcoming Conference
SPAA '24

Sponsor:

sigact

sigact

36th ACM Symposium on Parallelism in Algorithms and Architectures

June 17 - 21, 2024

Nantes , France
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 96
  Total Citations
  View Citations
- 913
  Total Downloads
- Downloads (Last 12 months)26
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Low depth cache-oblivious algorithms

SPAA '10: Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures

ABSTRACT

References

Cited By

Index Terms

Recommendations

Cache-Oblivious Algorithms

Brief announcement: low depth cache-oblivious sorting

Effects of Cache Coherency in Multiprocessors

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Low depth cache-oblivious algorithms

SPAA '10: Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures

ABSTRACT

References

Cited By

Index Terms

Recommendations

Cache-Oblivious Algorithms

Brief announcement: low depth cache-oblivious sorting

Effects of Cache Coherency in Multiprocessors

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media