skip to main content
10.1145/2486159.2486168acmconferencesArticle/Chapter ViewAbstractPublication PagesspaaConference Proceedingsconference-collections
research-article

Fast greedy algorithms in mapreduce and streaming

Published:23 July 2013Publication History

ABSTRACT

Greedy algorithms are practitioners' best friends - they are intuitive, simple to implement, and often lead to very good solutions. However, implementing greedy algorithms in a distributed setting is challenging since the greedy choice is inherently sequential, and it is not clear how to take advantage of the extra processing power.

Our main result is a powerful sampling technique that aids in parallelization of sequential algorithms. We then show how to use this primitive to adapt a broad class of greedy algorithms to the MapReduce paradigm; this class includes maximum cover and submodular maximization subject to p-system constraints. Our method yields efficient algorithms that run in a logarithmic number of rounds, while obtaining solutions that are arbitrarily close to those produced by the standard sequential greedy algorithm. We begin with algorithms for modular maximization subject to a matroid constraint, and then extend this approach to obtain approximation algorithms for submodular maximization subject to knapsack or p-system constraints. Finally, we empirically validate our algorithms, and show that they achieve the same quality of the solution as standard greedy algorithms but run in a substantially fewer number of rounds.

References

  1. R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong. Diversifying search results. In WSDM, pages 5--14, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Arasu, S. Chaudhuri, and R. Kaushik. Learning string transformations from examples. PVLDB, 2(1):514--525, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. Babaioff, N. Immorlica, and R. Kleinberg. Matroids, secretary problems, and online mechanisms. In SODA, pages 434--443, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. B. Bahmani, R. Kumar, and S. Vassilvitskii. Densest subgraph in streaming and MapReduce. PVLDB, 5(1), 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Bateni, M. Hajiaghayi, and M. Zadimoghaddam. Submodular secretary problem and extensions. In APPROX-RANDOM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Berger, Rompel, and Shor. Efficient NC algorithms for set cover with applications to learning and geometry. JCSS, 49:454--477, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. G. E. Blelloch, R. Peng, and K. Tangwongsan. Linear-work greedy parallel approximate set cover and variants. In SPAA, pages 23--32, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. G. Calinescu, C. Chekuri, M. Pal, and J. Vondrak. Maximizing a submodular set function subject to a matroid constraint. SICOMP, To appear. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. G. Capannini, F. M. Nardini, R. Perego, and F. Silvestri. Efficient diversification of web search results. PVLDB, 4(7):451--459, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Charikar. Greedy approximation algorithms for finding dense components in a graph. In APPROX, pages 84--95, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. W. Chen, C. Wang, and Y. Wang. Scalable influence maximization for prevalent viral marketing in large-scale social networks. In KDD, pages 1029--1038, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. W. Chen, Y. Wang, and S. Yang. Efficient influence maximization in social networks. In KDD, pages 199--208, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. F. Chierichetti, R. Kumar, and A. Tomkins. Max-cover in Map-reduce. In WWW, pages 231--240, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. G. Cormode, H. J. Karloff, and A. Wirth. Set cover algorithms for very large datasets. In CIKM, pages 479--488, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. B. C. Dean, M. X. Goemans, and J. Vondrák. Adaptivity and approximation for stochastic packing problems. In SODA, pages 395--404, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. In OSDI, page 10, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Edmonds. Matroids and the greedy algorithm. Mathematical Programming, 1:126--136, 1971.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. A. Ene, S. Im, and B. Moseley. Fast clustering using MapReduce. In KDD, pages 85--94, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. U. Feige. A threshold of ln n for approximating set cover. JACM, 45(4):634--652, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. L. Fisher, G. L. Nemhauser, and L. A. Wolsey. An analysis of approximation for maximizing submodular set functions II. Math. Prog. Study, 8:73--87, 1978.Google ScholarGoogle ScholarCross RefCross Ref
  21. A. Goel, S. Guha, and K. Munagala. Asking the right questions: model-driven optimization using probes. In PODS, pages 203--212, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. T. Goodrich, N. Sitchinava, and Q. Zhang. Sorting, searching, and simulation in the MapReduce framework. In ISAAC, pages 374--383, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. A. Goyal, F. Bonchi, and L. V. S. Lakshmana. A data-based approach to social influence maximization. PVLDB, 5(1), 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. A. Gupta, A. Roth, G. Schoenebeck, and K. Talwar. Constrained non-monotone submodular maximization: Offline and secretary algorithms. In WINE, pages 246--257, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. E. Hazan, S. Safra, and O. Schwartz. On the complexity of approximating k-set packing. Computational Complexity, 15(1):20--39, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. T. A. Jenkyns. The efficacy of the "greedy" algorithm. In Proceedings of 7th South Eastern Conference on Combinatorics, Graph Theory and Computing, pages 341--350, 1976.Google ScholarGoogle Scholar
  27. H. J. Karloff, S. Suri, and S. Vassilvitskii. A model of computation for MapReduce. In SODA, pages 938--948, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. D. Kempe, J. M. Kleinberg, and É. Tardos. Maximizing the spread of influence through a social network. In KDD, pages 137--146, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. B. Korte and D. Hausmann. An analysis of the greedy heuristic for independence systems. Annals of Discrete Math., 2:65--74, 1978.Google ScholarGoogle ScholarCross RefCross Ref
  30. S. Lattanzi, B. Moseley, S. Suri, and S. Vassilvitskii. Filtering: a method for solving graph problems in MapReduce. In SPAA, pages 85--94, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. J. Leskovec, A. Krause, C. Guestrin, C. Faloutsos, J. VanBriesen, and N. Glance. Cost-effective outbreak detection in networks. In KDD, pages 420--429, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. S. Muthukrishnan. Data streams: Algorithms and applications. Foundations and Trends in Theoretical Computer Science, 1(2), 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher. An analysis of approximation for maximizing submodular set functions I. Math. Prog., 14:265--294, 1978.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. A. Pietracaprina, G. Pucci, M. Riondato, F. Silvestri, and E. Upfal. Space-round tradeoffs for mapreduce computations. CoRR, abs/1111.2228, 2011.Google ScholarGoogle Scholar
  35. B. Saha and L. Getoor. On maximum coverage in the streaming model & application to multi-topic blog-watch. In SDM, pages 697--708, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  36. A. D. Sarma, A. Lall, D. Nanongkai, R. J. Lipton, and J. J. Xu. Representative skylines using threshold-based preference distributions. In ICDE, pages 387--398, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. L. G. Valiant. A bridging model for parallel computation. Commun. ACM, 33(8):103--111, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. J. Vondrak. Submodularity in Combinatorial Optimization. PhD thesis, Charles University, Prague, 2007.Google ScholarGoogle Scholar

Index Terms

  1. Fast greedy algorithms in mapreduce and streaming

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SPAA '13: Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures
      July 2013
      348 pages
      ISBN:9781450315722
      DOI:10.1145/2486159

      Copyright © 2013 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 23 July 2013

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      SPAA '13 Paper Acceptance Rate31of130submissions,24%Overall Acceptance Rate447of1,461submissions,31%

      Upcoming Conference

      SPAA '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader