skip to main content
10.1145/2935764.2935799acmconferencesArticle/Chapter ViewAbstractPublication PagesspaaConference Proceedingsconference-collections
research-article
Open Access

Shuffles and Circuits: (On Lower Bounds for Modern Parallel Computation)

Published:11 July 2016Publication History

ABSTRACT

The goal of this paper is to identify fundamental limitations on how efficiently algorithms implemented on platforms such as MapReduce and Hadoop can compute the central problems in the motivating application domains, such as graph connectivity problems.

We introduce an abstract model of massively parallel computation, where essentially the only restrictions are that the "fan-in" of each machine is limited to s bits, where s is smaller than the input size n, and that computation proceeds in synchronized rounds, with no communication between different machines within a round. Lower bounds on the round complexity of a problem in this model apply to every computing platform that shares the most basic design principles of MapReduce-type systems.

We prove that computations in our model that use few rounds can be represented as low-degree polynomials over the reals. This connection allows us to translate a lower bound on the (approximate) polynomial degree of a Boolean function to a lower bound on the round complexity of every (randomized) massively parallel computation of that function. These lower bounds apply even in the "unbounded width" version of our model, where the number of machines can be arbitrarily large. As one example of our general results, computing any non-trivial monotone graph property --- such as connectivity --- requires a super-constant number of rounds when every machine can accept only a sub-polynomial (in n) number of input bits s.

Finally, we prove that, in two senses, our lower bounds are the best one could hope for. For the unbounded-width model, we prove a matching upper bound. Restricting to a polynomial number of machines, we show that asymptotically better lower bounds require proving that P ≠ NC1.

References

  1. Foto N. Afrati, Anish Das Sarma, Semih Salihoglu, and Jeffrey D. Ullman. Upper and lower bounds on the cost of a map-reduce computation. PVLDB, 6(4):277--288, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Alexandr Andoni, Aleksandar Nikolov, Krzysztof Onak, and Grigory Yaroslavtsev. Parallel algorithms for geometric graph problems. In Symposium on Theory of Computing, STOC 2014, New York, NY, USA, May 31 - June 03, 2014, pages 574--583, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. MohammadHossein Bateni, Aditya Bhaskara, Silvio Lattanzi, and Vahab S. Mirrokni. Distributed balanced clustering via mapping coresets. In Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8--13 2014, Montreal, Quebec, Canada, pages 2591--2599, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Robert Beals, Harry Buhrman, Richard Cleve, Michele Mosca, and Ronald De Wolf. Quantum lower bounds by polynomials. Journal of the ACM (JACM), 48(4):778--797, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Paul Beame, Paraschos Koutris, and Dan Suciu. Communication steps for parallel query processing. In Proceedings of the 32nd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2013, New York, NY, USA - June 22 - 27, 2013, pages 273--284, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Harry Buhrman and Ronald De Wolf. Complexity measures and decision tree complexity: a survey. Theoretical Computer Science, 288(1):21--43, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Stephen Cook, Cynthia Dwork, and Rüdiger Reischuk. Upper and lower time bounds for parallel random access machines without simultaneous writes. SIAM Journal on Computing, 15(1):87--97, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Jeffrey Dean and Sanjay Ghemawat. Mapreduce: Simplified data processing on large clusters. In 6th Symposium on Operating System Design and Implementation (OSDI 2004), San Francisco, California, USA, December 6--8, 2004, pages 137--150, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Martin Dietzfelbinger, Mirosław Kutyłowski, and Rüdiger Reischuk. Exact lower time bounds for computing boolean functions on crew prams. Journal of Computer and System Sciences, 48(2):231--254, 1994. Preliminary version in SPAA '90. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. R. O'Donnell. Analysis of Boolean Functions. Cambridge, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Andrew Drucker, Fabian Kuhn, and Rotem Oshman. On the power of the congested clique model. In Proceedings of the 2014 ACM Symposium on Principles of Distributed Computing (PODC), pages 367--376. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Alina Ene, Sungjin Im, and Benjamin Moseley. Fast clustering using mapreduce. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, USA, August 21--24, 2011, pages 681--689, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Jon Feldman, S. Muthukrishnan, Anastasios Sidiropoulos, Clifford Stein, and Zoya Svitkina. On distributing symmetric streaming computations. In Proceedings of the Nineteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2008, San Francisco, California, USA, January 20--22, 2008, pages 710--719, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Benjamin Fish, Jeremy Kun, Ádám Dániel Lelkes, Lev Reyzin, and György Turán. On the computational complexity of mapreduce. CoRR, abs/1410.0245, 2014.Google ScholarGoogle Scholar
  15. Ashish Goel and Kamesh Munagala. Complexity measures for map-reduce, and comparison to parallel computing. CoRR, abs/1211.6526, 2012.Google ScholarGoogle Scholar
  16. Michael T. Goodrich, Nodari Sitchinava, and Qin Zhang. Sorting, searching, and simulation in the mapreduce framework. In Algorithms and Computation - 22nd International Symposium, ISAAC 2011, Yokohama, Japan, December 5--8, 2011. Proceedings, pages 374--383, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. James W. Hegeman and Sriram V. Pemmaraju. Lessons from the congested clique applied to MapReduce. Theoretical Computer Science, 608:268--281, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Pavel Hrubes and Anup Rao. Circuits with medium fan-in. In Proceedings of CCC, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Riko Jacob, Tobias Lieber, and Nodari Sitchinava. On the complexity of list ranking in the parallel external memory model. In Mathematical Foundations of Computer Science 2014 - 39th International Symposium, MFCS 2014, Budapest, Hungary, August 25--29, 2014. Proceedings, Part II, pages 384--395, 2014.Google ScholarGoogle Scholar
  20. Jeff Kahn, Michael Saks, and Dean Sturtevant. A topological approach to evasiveness. Combinatorica, 4(4):297--306, 1984.Google ScholarGoogle ScholarCross RefCross Ref
  21. Howard J. Karloff, Siddharth Suri, and Sergei Vassilvitskii. A model of computation for MapReduce. In Proceedings of the Twenty-First Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2010, Austin, Texas, USA, January 17--19, 2010, pages 938--948, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Hartmut Klauck, Danupon Nanongkai, Gopal Pandurangan, and Peter Robinson. Distributed computation of large-scale graph problems. In Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2015, San Diego, CA, USA, January 4--6, 2015, pages 391--410, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Daniel J. Kleitman and D. J. Kwiatkowski. Further results on the aanderaa-rosenberg conjecture. J. Comb. Theory, Ser. B, 28(1):85--95, 1980.Google ScholarGoogle ScholarCross RefCross Ref
  24. Torsten Korneffel and Eberhard Triesch. An asymptotic bound for the complexity of monotone graph properties. Combinatorica, 30(6):735--743, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Ravi Kumar, Benjamin Moseley, Sergei Vassilvitskii, and Andrea Vattani. Fast greedy algorithms in mapreduce and streaming. In 25th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA '13, Montreal, QC, Canada - July 23 - 25, 2013, pages 1--10, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Silvio Lattanzi, Benjamin Moseley, Siddharth Suri, and Sergei Vassilvitskii. Filtering: a method for solving graph problems in mapreduce. In SPAA 2011: Proceedings of the 23rd Annual ACM Symposium on Parallelism in Algorithms and Architectures, San Jose, CA, USA, June 4--6, 2011 (Co-located with FCRC 2011), pages 85--94, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. J. Leskovec, A. Rajaraman, and J. D. Ullman. Mining of Massive Datasets. Cambridge, 2014. Second Edition. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Vahab Mirrokni and Morteza Zadimoghaddam. Randomized composable core-sets for distributed submodular maximization. In Symposium on Theory of Computing, STOC 2015, Portland, OR, USA, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Noam Nisan. Crew prams and decision trees. SIAM Journal on Computing, 20(6):999--1007, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Noam Nisan and Mario Szegedy. On the degree of boolean functions as real polynomials. Computational complexity, 4(4):301--313, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. David Peleg. Distributed computing. SIAM Monographs on discrete mathematics and applications, 5, 2000.Google ScholarGoogle Scholar
  32. Andrea Pietracaprina, Geppino Pucci, Matteo Riondato, Francesco Silvestri, and Eli Upfal. Space-round tradeoffs for mapreduce computations. In International Conference on Supercomputing, ICS'12, Venice, Italy, June 25--29, 2012, pages 235--244, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. A.A. Razborov. Lower bounds on the size of bounded depth circuits over a complete basis with logical addition. Mathematical notes of the Academy of Sciences of the USSR, 41(4):333--338, 1987.Google ScholarGoogle Scholar
  34. Ronald L Rivest and Jean Vuillemin. A generalization and proof of the aanderaa-rosenberg conjecture. In Proceedings of seventh annual ACM symposium on Theory of computing, pages 6--11. ACM, 1975. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Arnold L Rosenberg. On the time required to recognize properties of graphs: A problem. ACM SIGACT News, 5(4):15--16, 1973. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Robert Scheidweiler and Eberhard Triesch. A lower bound for the complexity of monotone graph properties. SIAM Journal on Discrete Mathematics, 27(1):257--265, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  37. Leslie G. Valiant. A bridging model for parallel computation. Commun. ACM, 33(8):103--111, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Heribert Vollmer. Introduction to circuit complexity: a uniform approach. Springer Science & Business Media, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. David P. Woodruff and Qin Zhang. When distributed computation is communication expensive. In Distributed Computing - 27th International Symposium, DISC 2013, Jerusalem, Israel, October 14--18, 2013. Proceedings, pages 16--30, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Shuffles and Circuits: (On Lower Bounds for Modern Parallel Computation)

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SPAA '16: Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures
        July 2016
        492 pages
        ISBN:9781450342100
        DOI:10.1145/2935764

        Copyright © 2016 Owner/Author

        Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 11 July 2016

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate447of1,461submissions,31%

        Upcoming Conference

        SPAA '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader