ABSTRACT
Greedy algorithms are practitioners' best friends - they are intuitive, simple to implement, and often lead to very good solutions. However, implementing greedy algorithms in a distributed setting is challenging since the greedy choice is inherently sequential, and it is not clear how to take advantage of the extra processing power.
Our main result is a powerful sampling technique that aids in parallelization of sequential algorithms. We then show how to use this primitive to adapt a broad class of greedy algorithms to the MapReduce paradigm; this class includes maximum cover and submodular maximization subject to p-system constraints. Our method yields efficient algorithms that run in a logarithmic number of rounds, while obtaining solutions that are arbitrarily close to those produced by the standard sequential greedy algorithm. We begin with algorithms for modular maximization subject to a matroid constraint, and then extend this approach to obtain approximation algorithms for submodular maximization subject to knapsack or p-system constraints. Finally, we empirically validate our algorithms, and show that they achieve the same quality of the solution as standard greedy algorithms but run in a substantially fewer number of rounds.
- R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong. Diversifying search results. In WSDM, pages 5--14, 2009. Google ScholarDigital Library
- A. Arasu, S. Chaudhuri, and R. Kaushik. Learning string transformations from examples. PVLDB, 2(1):514--525, 2009. Google ScholarDigital Library
- M. Babaioff, N. Immorlica, and R. Kleinberg. Matroids, secretary problems, and online mechanisms. In SODA, pages 434--443, 2007. Google ScholarDigital Library
- B. Bahmani, R. Kumar, and S. Vassilvitskii. Densest subgraph in streaming and MapReduce. PVLDB, 5(1), 2012. Google ScholarDigital Library
- M. Bateni, M. Hajiaghayi, and M. Zadimoghaddam. Submodular secretary problem and extensions. In APPROX-RANDOM, 2010. Google ScholarDigital Library
- Berger, Rompel, and Shor. Efficient NC algorithms for set cover with applications to learning and geometry. JCSS, 49:454--477, 1994. Google ScholarDigital Library
- G. E. Blelloch, R. Peng, and K. Tangwongsan. Linear-work greedy parallel approximate set cover and variants. In SPAA, pages 23--32, 2011. Google ScholarDigital Library
- G. Calinescu, C. Chekuri, M. Pal, and J. Vondrak. Maximizing a submodular set function subject to a matroid constraint. SICOMP, To appear. Google ScholarDigital Library
- G. Capannini, F. M. Nardini, R. Perego, and F. Silvestri. Efficient diversification of web search results. PVLDB, 4(7):451--459, 2011. Google ScholarDigital Library
- M. Charikar. Greedy approximation algorithms for finding dense components in a graph. In APPROX, pages 84--95, 2000. Google ScholarDigital Library
- W. Chen, C. Wang, and Y. Wang. Scalable influence maximization for prevalent viral marketing in large-scale social networks. In KDD, pages 1029--1038, 2010. Google ScholarDigital Library
- W. Chen, Y. Wang, and S. Yang. Efficient influence maximization in social networks. In KDD, pages 199--208, 2009. Google ScholarDigital Library
- F. Chierichetti, R. Kumar, and A. Tomkins. Max-cover in Map-reduce. In WWW, pages 231--240, 2010. Google ScholarDigital Library
- G. Cormode, H. J. Karloff, and A. Wirth. Set cover algorithms for very large datasets. In CIKM, pages 479--488, 2010. Google ScholarDigital Library
- B. C. Dean, M. X. Goemans, and J. Vondrák. Adaptivity and approximation for stochastic packing problems. In SODA, pages 395--404, 2005. Google ScholarDigital Library
- J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. In OSDI, page 10, 2004. Google ScholarDigital Library
- J. Edmonds. Matroids and the greedy algorithm. Mathematical Programming, 1:126--136, 1971.Google ScholarDigital Library
- A. Ene, S. Im, and B. Moseley. Fast clustering using MapReduce. In KDD, pages 85--94, 2011. Google ScholarDigital Library
- U. Feige. A threshold of ln n for approximating set cover. JACM, 45(4):634--652, 1998. Google ScholarDigital Library
- M. L. Fisher, G. L. Nemhauser, and L. A. Wolsey. An analysis of approximation for maximizing submodular set functions II. Math. Prog. Study, 8:73--87, 1978.Google ScholarCross Ref
- A. Goel, S. Guha, and K. Munagala. Asking the right questions: model-driven optimization using probes. In PODS, pages 203--212, 2006. Google ScholarDigital Library
- M. T. Goodrich, N. Sitchinava, and Q. Zhang. Sorting, searching, and simulation in the MapReduce framework. In ISAAC, pages 374--383, 2011. Google ScholarDigital Library
- A. Goyal, F. Bonchi, and L. V. S. Lakshmana. A data-based approach to social influence maximization. PVLDB, 5(1), 2012. Google ScholarDigital Library
- A. Gupta, A. Roth, G. Schoenebeck, and K. Talwar. Constrained non-monotone submodular maximization: Offline and secretary algorithms. In WINE, pages 246--257, 2010. Google ScholarDigital Library
- E. Hazan, S. Safra, and O. Schwartz. On the complexity of approximating k-set packing. Computational Complexity, 15(1):20--39, 2006. Google ScholarDigital Library
- T. A. Jenkyns. The efficacy of the "greedy" algorithm. In Proceedings of 7th South Eastern Conference on Combinatorics, Graph Theory and Computing, pages 341--350, 1976.Google Scholar
- H. J. Karloff, S. Suri, and S. Vassilvitskii. A model of computation for MapReduce. In SODA, pages 938--948, 2010. Google ScholarDigital Library
- D. Kempe, J. M. Kleinberg, and É. Tardos. Maximizing the spread of influence through a social network. In KDD, pages 137--146, 2003. Google ScholarDigital Library
- B. Korte and D. Hausmann. An analysis of the greedy heuristic for independence systems. Annals of Discrete Math., 2:65--74, 1978.Google ScholarCross Ref
- S. Lattanzi, B. Moseley, S. Suri, and S. Vassilvitskii. Filtering: a method for solving graph problems in MapReduce. In SPAA, pages 85--94, 2011. Google ScholarDigital Library
- J. Leskovec, A. Krause, C. Guestrin, C. Faloutsos, J. VanBriesen, and N. Glance. Cost-effective outbreak detection in networks. In KDD, pages 420--429, 2007. Google ScholarDigital Library
- S. Muthukrishnan. Data streams: Algorithms and applications. Foundations and Trends in Theoretical Computer Science, 1(2), 2005. Google ScholarDigital Library
- G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher. An analysis of approximation for maximizing submodular set functions I. Math. Prog., 14:265--294, 1978.Google ScholarDigital Library
- A. Pietracaprina, G. Pucci, M. Riondato, F. Silvestri, and E. Upfal. Space-round tradeoffs for mapreduce computations. CoRR, abs/1111.2228, 2011.Google Scholar
- B. Saha and L. Getoor. On maximum coverage in the streaming model & application to multi-topic blog-watch. In SDM, pages 697--708, 2009.Google ScholarCross Ref
- A. D. Sarma, A. Lall, D. Nanongkai, R. J. Lipton, and J. J. Xu. Representative skylines using threshold-based preference distributions. In ICDE, pages 387--398, 2011. Google ScholarDigital Library
- L. G. Valiant. A bridging model for parallel computation. Commun. ACM, 33(8):103--111, 1990. Google ScholarDigital Library
- J. Vondrak. Submodularity in Combinatorial Optimization. PhD thesis, Charles University, Prague, 2007.Google Scholar
Index Terms
- Fast greedy algorithms in mapreduce and streaming
Recommendations
Randomized greedy algorithms for covering problems
GECCO '18: Proceedings of the Genetic and Evolutionary Computation ConferenceGreedy algorithms provide a fast and often also effective solution to many combinatorial optimization problems. However, it is well known that they sometimes lead to low quality solutions on certain instances. In this paper, we explore the use of ...
Fast Greedy Algorithms in MapReduce and Streaming
Special Issue for SPAA 2013Greedy algorithms are practitioners’ best friends—they are intuitive, are simple to implement, and often lead to very good solutions. However, implementing greedy algorithms in a distributed setting is challenging since the greedy choice is inherently ...
Greedy by Chance - Stochastic Greedy Algorithms
ICAS '10: Proceedings of the 2010 Sixth International Conference on Autonomic and Autonomous SystemsFor many complex combinatorial optimization problems, obtaining good solutions quickly is of value either by itself or as part of an exact algorithm. Greedy algorithms to obtain such solutions are known for many problems. In this paper we present ...
Comments