research-article

Fast greedy algorithms in mapreduce and streaming

Authors:
Ravi Kumar

Google, Mountain View, CA, USA

Google, Mountain View, CA, USA
View Profile

,
Benjamin Moseley

Toyota Technological Institute at Chicago, Chicago, IL, USA

Toyota Technological Institute at Chicago, Chicago, IL, USA
View Profile

,
Sergei Vassilvitskii

Google, Mountain View, CA, USA

Google, Mountain View, CA, USA
View Profile

,
Andrea Vattani

University of California, San Diego, CA, USA

University of California, San Diego, CA, USA
View Profile

SPAA '13: Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architecturesJuly 2013Pages 1–10https://doi.org/10.1145/2486159.2486168

Published:23 July 2013Publication History

SPAA '13: Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures

Pages 1–10

ABSTRACT

Greedy algorithms are practitioners' best friends - they are intuitive, simple to implement, and often lead to very good solutions. However, implementing greedy algorithms in a distributed setting is challenging since the greedy choice is inherently sequential, and it is not clear how to take advantage of the extra processing power.

Our main result is a powerful sampling technique that aids in parallelization of sequential algorithms. We then show how to use this primitive to adapt a broad class of greedy algorithms to the MapReduce paradigm; this class includes maximum cover and submodular maximization subject to p-system constraints. Our method yields efficient algorithms that run in a logarithmic number of rounds, while obtaining solutions that are arbitrarily close to those produced by the standard sequential greedy algorithm. We begin with algorithms for modular maximization subject to a matroid constraint, and then extend this approach to obtain approximation algorithms for submodular maximization subject to knapsack or p-system constraints. Finally, we empirically validate our algorithms, and show that they achieve the same quality of the solution as standard greedy algorithms but run in a substantially fewer number of rounds.

References

R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong. Diversifying search results. In WSDM, pages 5--14, 2009. Google ScholarDigital Library
A. Arasu, S. Chaudhuri, and R. Kaushik. Learning string transformations from examples. PVLDB, 2(1):514--525, 2009. Google ScholarDigital Library
M. Babaioff, N. Immorlica, and R. Kleinberg. Matroids, secretary problems, and online mechanisms. In SODA, pages 434--443, 2007. Google ScholarDigital Library
B. Bahmani, R. Kumar, and S. Vassilvitskii. Densest subgraph in streaming and MapReduce. PVLDB, 5(1), 2012. Google ScholarDigital Library
M. Bateni, M. Hajiaghayi, and M. Zadimoghaddam. Submodular secretary problem and extensions. In APPROX-RANDOM, 2010. Google ScholarDigital Library
Berger, Rompel, and Shor. Efficient NC algorithms for set cover with applications to learning and geometry. JCSS, 49:454--477, 1994. Google ScholarDigital Library
G. E. Blelloch, R. Peng, and K. Tangwongsan. Linear-work greedy parallel approximate set cover and variants. In SPAA, pages 23--32, 2011. Google ScholarDigital Library
G. Calinescu, C. Chekuri, M. Pal, and J. Vondrak. Maximizing a submodular set function subject to a matroid constraint. SICOMP, To appear. Google ScholarDigital Library
G. Capannini, F. M. Nardini, R. Perego, and F. Silvestri. Efficient diversification of web search results. PVLDB, 4(7):451--459, 2011. Google ScholarDigital Library
M. Charikar. Greedy approximation algorithms for finding dense components in a graph. In APPROX, pages 84--95, 2000. Google ScholarDigital Library
W. Chen, C. Wang, and Y. Wang. Scalable influence maximization for prevalent viral marketing in large-scale social networks. In KDD, pages 1029--1038, 2010. Google ScholarDigital Library
W. Chen, Y. Wang, and S. Yang. Efficient influence maximization in social networks. In KDD, pages 199--208, 2009. Google ScholarDigital Library
F. Chierichetti, R. Kumar, and A. Tomkins. Max-cover in Map-reduce. In WWW, pages 231--240, 2010. Google ScholarDigital Library
G. Cormode, H. J. Karloff, and A. Wirth. Set cover algorithms for very large datasets. In CIKM, pages 479--488, 2010. Google ScholarDigital Library
B. C. Dean, M. X. Goemans, and J. Vondrák. Adaptivity and approximation for stochastic packing problems. In SODA, pages 395--404, 2005. Google ScholarDigital Library
J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. In OSDI, page 10, 2004. Google ScholarDigital Library
J. Edmonds. Matroids and the greedy algorithm. Mathematical Programming, 1:126--136, 1971.Google ScholarDigital Library
A. Ene, S. Im, and B. Moseley. Fast clustering using MapReduce. In KDD, pages 85--94, 2011. Google ScholarDigital Library
U. Feige. A threshold of ln n for approximating set cover. JACM, 45(4):634--652, 1998. Google ScholarDigital Library
M. L. Fisher, G. L. Nemhauser, and L. A. Wolsey. An analysis of approximation for maximizing submodular set functions II. Math. Prog. Study, 8:73--87, 1978.Google ScholarCross Ref
A. Goel, S. Guha, and K. Munagala. Asking the right questions: model-driven optimization using probes. In PODS, pages 203--212, 2006. Google ScholarDigital Library
M. T. Goodrich, N. Sitchinava, and Q. Zhang. Sorting, searching, and simulation in the MapReduce framework. In ISAAC, pages 374--383, 2011. Google ScholarDigital Library
A. Goyal, F. Bonchi, and L. V. S. Lakshmana. A data-based approach to social influence maximization. PVLDB, 5(1), 2012. Google ScholarDigital Library
A. Gupta, A. Roth, G. Schoenebeck, and K. Talwar. Constrained non-monotone submodular maximization: Offline and secretary algorithms. In WINE, pages 246--257, 2010. Google ScholarDigital Library
E. Hazan, S. Safra, and O. Schwartz. On the complexity of approximating k-set packing. Computational Complexity, 15(1):20--39, 2006. Google ScholarDigital Library
T. A. Jenkyns. The efficacy of the "greedy" algorithm. In Proceedings of 7th South Eastern Conference on Combinatorics, Graph Theory and Computing, pages 341--350, 1976.Google Scholar
H. J. Karloff, S. Suri, and S. Vassilvitskii. A model of computation for MapReduce. In SODA, pages 938--948, 2010. Google ScholarDigital Library
D. Kempe, J. M. Kleinberg, and É. Tardos. Maximizing the spread of influence through a social network. In KDD, pages 137--146, 2003. Google ScholarDigital Library
B. Korte and D. Hausmann. An analysis of the greedy heuristic for independence systems. Annals of Discrete Math., 2:65--74, 1978.Google ScholarCross Ref
S. Lattanzi, B. Moseley, S. Suri, and S. Vassilvitskii. Filtering: a method for solving graph problems in MapReduce. In SPAA, pages 85--94, 2011. Google ScholarDigital Library
J. Leskovec, A. Krause, C. Guestrin, C. Faloutsos, J. VanBriesen, and N. Glance. Cost-effective outbreak detection in networks. In KDD, pages 420--429, 2007. Google ScholarDigital Library
S. Muthukrishnan. Data streams: Algorithms and applications. Foundations and Trends in Theoretical Computer Science, 1(2), 2005. Google ScholarDigital Library
G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher. An analysis of approximation for maximizing submodular set functions I. Math. Prog., 14:265--294, 1978.Google ScholarDigital Library
A. Pietracaprina, G. Pucci, M. Riondato, F. Silvestri, and E. Upfal. Space-round tradeoffs for mapreduce computations. CoRR, abs/1111.2228, 2011.Google Scholar
B. Saha and L. Getoor. On maximum coverage in the streaming model & application to multi-topic blog-watch. In SDM, pages 697--708, 2009.Google ScholarCross Ref
A. D. Sarma, A. Lall, D. Nanongkai, R. J. Lipton, and J. J. Xu. Representative skylines using threshold-based preference distributions. In ICDE, pages 387--398, 2011. Google ScholarDigital Library
L. G. Valiant. A bridging model for parallel computation. Commun. ACM, 33(8):103--111, 1990. Google ScholarDigital Library
J. Vondrak. Submodularity in Combinatorial Optimization. PhD thesis, Charles University, Prague, 2007.Google Scholar

Index Terms

Fast greedy algorithms in mapreduce and streaming
1. Theory of computation
  1. Randomness, geometry and discrete structures

Recommendations

Randomized greedy algorithms for covering problems
GECCO '18: Proceedings of the Genetic and Evolutionary Computation Conference

Greedy algorithms provide a fast and often also effective solution to many combinatorial optimization problems. However, it is well known that they sometimes lead to low quality solutions on certain instances. In this paper, we explore the use of ...
Read More
Fast Greedy Algorithms in MapReduce and Streaming
Special Issue for SPAA 2013

Greedy algorithms are practitioners’ best friends—they are intuitive, are simple to implement, and often lead to very good solutions. However, implementing greedy algorithms in a distributed setting is challenging since the greedy choice is inherently ...
Read More
Greedy by Chance - Stochastic Greedy Algorithms
ICAS '10: Proceedings of the 2010 Sixth International Conference on Autonomic and Autonomous Systems

For many complex combinatorial optimization problems, obtaining good solutions quickly is of value either by itself or as part of an exact algorithm. Greedy algorithms to obtain such solutions are known for many problems. In this paper we present ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SPAA '13: Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures
July 2013
348 pages
ISBN:9781450315722
DOI:10.1145/2486159
General Chair:
Guy Blelloch
Carnegie Mellon University, USA
,
Program Chair:
Berthold Vöcking
RWTH Aachen University, Germany
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 July 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
algorithm analysis
approximation algorithms
distributed computing
greedy algorithms
map-reduce
submodular function
Qualifiers
- research-article
Conference

Acceptance Rates
SPAA '13 Paper Acceptance Rate31of130submissions,24%Overall Acceptance Rate447of1,461submissions,31%
More
Upcoming Conference
SPAA '24

Sponsor:

sigact

sigact

36th ACM Symposium on Parallelism in Algorithms and Architectures

June 17 - 21, 2024

Nantes , France
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 51
  Total Citations
  View Citations
- 660
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Fast greedy algorithms in mapreduce and streaming

SPAA '13: Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures

ABSTRACT

References

Cited By

Index Terms

Recommendations

Randomized greedy algorithms for covering problems

Fast Greedy Algorithms in MapReduce and Streaming

Greedy by Chance - Stochastic Greedy Algorithms