Abstract
Greedy algorithms are practitioners’ best friends—they are intuitive, are simple to implement, and often lead to very good solutions. However, implementing greedy algorithms in a distributed setting is challenging since the greedy choice is inherently sequential, and it is not clear how to take advantage of the extra processing power.
Our main result is a powerful sampling technique that aids in parallelization of sequential algorithms. Armed with this primitive, we then adapt a broad class of greedy algorithms to the MapReduce paradigm; this class includes maximum cover and submodular maximization subject to p-system constraint problems. Our method yields efficient algorithms that run in a logarithmic number of rounds while obtaining solutions that are arbitrarily close to those produced by the standard sequential greedy algorithm. We begin with algorithms for modular maximization subject to a matroid constraint and then extend this approach to obtain approximation algorithms for submodular maximization subject to knapsack or p-system constraints.
- R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong. 2009. Diversifying search results. In Proceedings of the 2nd ACM International Conference on Web Search and Data Mining (WSDM’09). ACM, New York, NY, 5--14. DOI:http://dx.doi.org/10.1145/1498759.1498766 Google ScholarDigital Library
- A. Arasu, S. Chaudhuri, and R. Kaushik. 2009. Learning string transformations from examples. PVLDB 2, 1 (2009), 514--525. Google ScholarDigital Library
- M. Babaioff, N. Immorlica, and R. Kleinberg. 2007. Matroids, secretary problems, and online mechanisms. In Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’07), New Orleans, LA, January 7-9, 2007. 434--443. Google ScholarDigital Library
- B. Bahmani, R. Kumar, and S. Vassilvitskii. 2012a. Densest subgraph in streaming and Map-Reduce. PVLDB 5, 1 (2012), 454--465. Google ScholarDigital Library
- B. Bahmani, B. Moseley, A. Vattani, R. Kumar, and S. Vassilvitskii. 2012b. Scalable K-Means++. PVLDB 5, 7 (2012), 622--633. Google ScholarDigital Library
- Z. Bar-Yossef, T. S. Jayram, R. Kumar, and D. Sivakumar. 2004. An information statistics approach to data stream and communication complexity. J. Comput. Syst. Sci. 68, 4 (2004), 702--732. Google ScholarDigital Library
- M. H. Bateni, M. T. Hajiaghayi, and M. Zadimoghaddam. 2013. Submodular secretary problem and extensions. ACM Trans. Algorithms 9, 4 (2013), 32. Google ScholarDigital Library
- B. Berger, J. Rompel, and P. W. Shor. 1994. Efficient NC algorithms for set cover with applications to learning and geometry. J. Comput. Syst. Sci. 49, 3 (1994), 454--477. Google ScholarDigital Library
- G. E. Blelloch, R. Peng, and K. Tangwongsan. 2011. Linear-work greedy parallel approximate set cover and variants. In Proceedings of the 23rd Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA’11), San Jose, CA, June 4-6, 2011 (Co-located with FCRC 2011). 23--32. Google ScholarDigital Library
- G. Călinescu, C. Chekuri, M. Pál, and J. Vondrák. 2011. Maximizing a monotone submodular function subject to a matroid constraint. SIAM J. Comput. 40, 6 (2011), 1740--1766. Google ScholarDigital Library
- G. Capannini, F. M. Nardini, R. Perego, and F. Silvestri. 2011. Efficient diversification of web search results. PVLDB 4, 7 (2011), 451--459. Google ScholarDigital Library
- A. Chakrabarti, S. Khot, and X. Sun. 2003. Near-optimal lower bounds on the multi-party communication complexity of set disjointness. In 18th Annual IEEE Conference on Computational Complexity (Complexity 2003), Aarhus, Denmark, July 7-10, 2003. 107--117.Google Scholar
- M. Charikar. 2000. Greedy approximation algorithms for finding dense components in a graph. In Proceedings of the 3rd International Workshop on Approximation Algorithms for Combinatorial Optimization (APPROX’00), Saarbrücken, Germany, September 5-8, 2000. 84--95. Google ScholarDigital Library
- W. Chen, Y. Wang, and S. Yang. 2009. Efficient influence maximization in social networks. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, France, June 28 - July 1, 2009. 199--208. Google ScholarDigital Library
- F. Chierichetti, R. Kumar, and A. Tomkins. 2010. Max-cover in map-reduce. In Proceedings of the 19th International Conference on World Wide Web (WWW’10), Raleigh, NC, April 26-30, 2010. 231--240. Google ScholarDigital Library
- G. Cormode, H. J. Karloff, and A. Wirth. 2010. Set cover algorithms for very large datasets. In Proceedings of the 19th ACM Conference on Information and Knowledge Management (CIKM’10), Toronto, Ontario, Canada, October 26-30, 2010. 479--488. Google ScholarDigital Library
- B. C. Dean, M. X. Goemans, and J. Vondrák. 2005. Adaptivity and approximation for stochastic packing problems. In Proceedings of the 16th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’05), Vancouver, British Columbia, Canada, January 23-25, 2005. 395--404. Google ScholarDigital Library
- J. Dean and S. Ghemawat. 2004. MapReduce: Simplified data processing on large clusters. In 6th Symposium on Operating System Design and Implementation (OSDI’04), San Francisco, CA, December 6-8, 2004. 137--150. Google ScholarDigital Library
- E. W. Dijkstra. 1959. A note on two problems in connexion with graphs. Numer. Math. 1, 1 (1959), 269--271. Google ScholarDigital Library
- J. Edmonds. 1971. Matroids and the greedy algorithm. Math. Program. 1 (1971), 126--136.Google ScholarDigital Library
- A. Ene, S. Im, and B. Moseley. 2011. Fast clustering using MapReduce. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, August 21-24, 2011. 681--689. Google ScholarDigital Library
- U. Feige. 1998. A threshold of Ln N for approximating set cover. J. ACM 45, 4 (July 1998), 634--652. DOI:http://dx.doi.org/10.1145/285055.285059 Google ScholarDigital Library
- M. L. Fisher, G. L. Nemhauser, and L. A. Wolsey. 1978. An analysis of approximation for maximizing submodular set functions II. Math. Prog. Study 8 (1978), 73--87.Google ScholarCross Ref
- L. R. Ford and D. R. Fulkerson. 1956. Maximal flow through a network. Can. J. Math. 8 (1956), 399--404.Google ScholarCross Ref
- D. Gale and L. S. Shapley. 1962. College admissions and the stability of marriage. Am. Math. Monthly 69, 1 (1962), 9--15. http://www.jstor.org/stable/2312726.Google ScholarCross Ref
- A. Goel, S. Guha, and K. Munagala. 2006. Asking the right questions: Model-driven optimization using probes. In Proceedings of the 25th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, Chicago, IL, June 26-28, 2006. 203--212. Google ScholarDigital Library
- M. T. Goodrich, N. Sitchinava, and Q. Zhang. 2011. Sorting, searching, and simulation in the MapReduce framework. In Proceedings of the 22nd International Symposium on Algorithms and Computation (ISAAC’11), Yokohama, Japan, December 5-8, 2011. 374--383. Google ScholarDigital Library
- A. Goyal, F. Bonchi, and L. V. S. Lakshmana. 2012. A data-based approach to social influence maximization. PVLDB 5, 1 (2012). Google ScholarDigital Library
- A. Gupta, A. Roth, G. Schoenebeck, and K. Talwar. 2010. Constrained non-monotone submodular maximization: Offline and secretary algorithms. In Proceedings of the 6th International Workshop on Internet and Network Economics (WINE’10), Stanford, CA, December 13-17, 2010. 246--257. Google ScholarDigital Library
- E. Hazan, S. Safra, and O. Schwartz. 2006. On the complexity of approximating k-set packing. Comput. Complexity 15, 1 (2006), 20--39. Google ScholarDigital Library
- D. A. Huffman. 1952. A method for the construction of minimum-redundancy codes. Proceedings of the Institute of Radio Engineers 40, 9 (1952), 1098--1101.Google Scholar
- T. A. Jenkyns. 1976. The efficacy of the “greedy” algorithm. In Proceedings of 7th South Eastern Conference on Combinatorics, Graph Theory and Computing. 341--350.Google Scholar
- H. J. Karloff, S. Suri, and S. Vassilvitskii. 2010. A model of computation for MapReduce. In Proceedings of the 21st Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’10), Austin, Texas, January 17-19, 2010. 938--948. Google ScholarDigital Library
- D. Kempe, J. M. Kleinberg, and É. Tardos. 2003. Maximizing the spread of influence through a social network. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, August 24-27, 2003. 137--146. Google ScholarDigital Library
- B. Korte and D. Hausmann. 1978. An analysis of the greedy heuristic for independence systems. Ann. Discrete Math. 2 (1978), 65--74.Google ScholarCross Ref
- R. Kumar, B. Moseley, S. Vassilvitskii, and A. Vattani. 2013. Fast greedy algorithms in mapreduce and streaming. In 25th ACM Symposium on Parallelism in Algorithms and Architectures, (SPAA’13), Montreal, QC, Canada, July 23-25, 2013. 1--10. Google ScholarDigital Library
- S. Lattanzi, B. Moseley, S. Suri, and S. Vassilvitskii. 2011. Filtering: A method for solving graph problems in MapReduce. In Proceedings of the 23rd Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA’11), San Jose, CA, June 4-6, 2011 (Co-located with FCRC’11). 85--94. Google ScholarDigital Library
- J. Leskovec, A. Krause, C. Guestrin, C. Faloutsos, J. M. VanBriesen, and N. S. Glance. 2007. Cost-effective outbreak detection in networks. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose, California, August 12-15, 2007. 420--429. Google ScholarDigital Library
- B. Mirzasoleiman, A. Karbasi, R. Sarkar, and A. Krause. 2013. Distributed submodular maximization: Identifying representative elements in massive data. In Neural Information Processing Systems (NIPS).Google Scholar
- S. Muthukrishnan. 2005. Data streams: Algorithms and applications. Foundations Trends Theor. Comput. Sci. 1, 2 (2005). Google ScholarDigital Library
- G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher. 1978. An analysis of approximation for maximizing submodular set functions I. Math. Prog. 14 (1978), 265--294.Google ScholarDigital Library
- A. Pietracaprina, G. Pucci, M. Riondato, F. Silvestri, and E. Upfal. 2012. Space-round tradeoffs for MapReduce computations. In International Conference on Supercomputing (ICS’12), Venice, Italy, June 25-29, 2012. 235--244. Google ScholarDigital Library
- L. Rokach and O. Maimon. 2005. Top-down induction of decision trees classifiers - a survey. IEEE Tran. Syst. Man Cybernetics Part C 35, 4 (2005), 476--487. Google ScholarDigital Library
- B. Saha and L. Getoor. 2009. On maximum coverage in the streaming model & application to multi-topic blog-watch. In Proceedings of the SIAM International Conference on Data Mining (SDM’’09), Sparks, Nevada, April 30-May 2, 2009. 697--708.Google Scholar
- A. D. Sarma, A. Lall, D. Nanongkai, R. J. Lipton, and J. (Jim) Xu. 2011. Representative skylines using threshold-based preference distributions. In Proceedings of the 27th International Conference on Data Engineering (ICDE’11), Hannover, Germany, April 11-16, 2011. 387--398. Google ScholarDigital Library
- A. Schrijver. 2002. Combinatorial Optimization. Vol. B. Springer.Google Scholar
- L. G. Valiant. 1990. A bridging model for parallel computation. Commun. ACM 33, 8 (1990), 103--111. Google ScholarDigital Library
- J. Vondrak. 2007. Submodularity in Combinatorial Optimization. Ph.D. Dissertation. Charles University, Prague.Google Scholar
- C. Wang, W. Chen, and Y. Wang. 2012. Scalable influence maximization for independent cascade model in large-scale social networks. Data Min. Knowl. Discov. 25, 3 (2012), 545--576.Google ScholarCross Ref
Index Terms
- Fast Greedy Algorithms in MapReduce and Streaming
Recommendations
Fast greedy algorithms in mapreduce and streaming
SPAA '13: Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architecturesGreedy algorithms are practitioners' best friends - they are intuitive, simple to implement, and often lead to very good solutions. However, implementing greedy algorithms in a distributed setting is challenging since the greedy choice is inherently ...
Randomized greedy algorithms for covering problems
GECCO '18: Proceedings of the Genetic and Evolutionary Computation ConferenceGreedy algorithms provide a fast and often also effective solution to many combinatorial optimization problems. However, it is well known that they sometimes lead to low quality solutions on certain instances. In this paper, we explore the use of ...
Greedy by Chance - Stochastic Greedy Algorithms
ICAS '10: Proceedings of the 2010 Sixth International Conference on Autonomic and Autonomous SystemsFor many complex combinatorial optimization problems, obtaining good solutions quickly is of value either by itself or as part of an exact algorithm. Greedy algorithms to obtain such solutions are known for many problems. In this paper we present ...
Comments