research-article

Open Access

Randomized Composable Core-sets for Distributed Submodular Maximization

Authors:
Vahab Mirrokni

Google Research, New York, USA

Google Research, New York, USA
View Profile

,
Morteza Zadimoghaddam

Google Research, New York, USA

Google Research, New York, USA
View Profile

STOC '15: Proceedings of the forty-seventh annual ACM symposium on Theory of ComputingJune 2015Pages 153–162https://doi.org/10.1145/2746539.2746624

Published:14 June 2015Publication History

STOC '15: Proceedings of the forty-seventh annual ACM symposium on Theory of Computing

Pages 153–162

ABSTRACT

An effective technique for solving optimization problems over massive data sets is to partition the data into smaller pieces, solve the problem on each piece and compute a representative solution from it, and finally obtain a solution inside the union of the representative solutions for all pieces. This technique can be captured via the concept of composable core-sets, and has been recently applied to solve diversity maximization problems as well as several clustering problems [7,15,8]. However, for coverage and submodular maximization problems, impossibility bounds are known for this technique [15]. In this paper, we focus on efficient construction of a randomized variant of composable core-sets where the above idea is applied on a random clustering of the data. We employ this technique for the coverage, monotone and non-monotone submodular maximization problems. Our results significantly improve upon the hardness results for non-randomized core-sets, and imply improved results for submodular maximization in a distributed and streaming settings. The effectiveness of this technique has been confirmed empirically for several machine learning applications [22], and our proof provides a theoretical foundation to this idea.

In summary, we show that a simple greedy algorithm results in a 1/3-approximate randomized composable core-set for submodular maximization under a cardinality constraint. Our result also extends to non-monotone submodular functions, and leads to the first 2-round MapReduce-based constant-factor approximation algorithm with O(n) total communication complexity for either monotone or non-monotone functions. Finally, using an improved analysis technique and a new algorithm PseudoGreedy, we present an improved 0.545-approximation algorithm for monotone submodular maximization, which is in turn the first MapReduce-based algorithm beating factor 1/2 in a constant number of rounds.

References

S. Abbar, S. Amer-Yahia, P. Indyk, S. Mahabadi, and K. R. Varadarajan. Diverse near neighbor problem. In Symposuim on Computational Geometry 2013, SoCG '13, Rio de Janeiro, Brazil, June 17--20, 2013, pages 207--214, 2013. Google ScholarDigital Library
P. K. Agarwal, G. Cormode, Z. Huang, J. Phillips, Z. Wei, and K. Yi. Mergeable summaries. In Proceedings of the 31st symposium on Principles of Database Systems, pages 23--34. ACM, 2012. Google ScholarDigital Library
P. K. Agarwal, S. Har-Peled, and K. R. Varadarajan. Approximating extent measures of points. Journal of the ACM (JACM), 51(4):606--635, 2004. Google ScholarDigital Library
A. Andoni, A. Nikolov, K. Onak, and G. Yaroslavtsev. Parallel algorithms for geometric graph problems. In Symposium on Theory of Computing, STOC 2014, New York, NY, USA, May 31 - June 03, 2014, pages 574--583, 2014. Google ScholarDigital Library
A. Badanidiyuru, B. Mirzasoleiman, A. Karbasi, and A. Krause. Streaming submodular maximization: Massive data summarization on the fly. In KDD, 2014. Google ScholarDigital Library
A. Badanidiyuru and J. Vondrak. Fast algorithms for maximizing submodular functions. In SODA, pages 1497--1514, 2014. Google ScholarDigital Library
M.-F. Balcan, S. Ehrlich, and Y. Liang. Distributed clustering on graphs. In NIPS, page to appear, 2013.Google Scholar
M. Bateni, A. Bhashkara, S. Lattanzi, and V. Mirrokni. Mapping core-sets for balanced clustering. In Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5--8, 2013, Lake Tahoe, Nevada, United States., 2014.Google Scholar
G. E. Blelloch, H. V. Simhadri, and K. Tangwongsan. Parallel and i/o efficient set covering algorithms. In SPAA, pages 82--90, 2012. Google ScholarDigital Library
N. Buchbinder, M. Feldman, J. S. Naor, and R. Schwartz. Submodular maximization with cardinality constraints. In Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA '14, pages 1433--1452. SIAM, 2014. Google ScholarDigital Library
F. Chierichetti, R. Kumar, and A. Tomkins. Max-cover in map-reduce. In WWW, pages 231--240, 2010. Google ScholarDigital Library
G. Cormode, H. J. Karloff, and A. Wirth. Set cover algorithms for very large datasets. In CIKM, pages 479--488, 2010. Google ScholarDigital Library
J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. In OSDI, pages 137--150, 2004. Google ScholarDigital Library
S. Guha, N. Mishra, R. Motwani, and L. O'Callaghan. Clustering data streams. STOC, 2001.Google ScholarDigital Library
P. Indyk, S. Mahabadi, M. Mahdian, and V. Mirrokni. Composable core-sets for diversity and coverage maximization. In ACM PODS, 2014. Google ScholarDigital Library
M. Kapralov, S. Khanna, and M. Sudan. Approximating matching size from random streams. In SODA, pages 734--751, 2014. Google ScholarDigital Library
H. J. Karloff, S. Suri, and S. Vassilvitskii. A model of computation for mapreduce. In SODA, pages 938--948, 2010. Google ScholarDigital Library
R. kiveris, S. Lattanzi, V. Mirrokni, V. Rastogi, and S. Vasilvitski. Connected components in mapreduce and beyond. In ACM SOCC, 2014. Google ScholarDigital Library
N. Korula, V. Mirrokni, and M. Zadimoghaddam. Online submodular welfare maximization: Greedy beats 1/2 in random order. In to appear STOC, 2015. Google ScholarDigital Library
R. Kumar, B. Moseley, S. Vassilvitskii, and A. Vattani. Fast greedy algorithms in mapreduce and streaming. In SPAA, pages 1--10, 2013. Google ScholarDigital Library
S. Lattanzi, B. Moseley, S. Suri, and S. Vassilvitskii. Filtering: a method for solving graph problems in mapreduce. In SPAA, pages 85--94, 2011. Google ScholarDigital Library
B. Mirzasoleiman, A. Karbasi, R. Sarkar, and A. Krause. Distributed submodular maximization: Identifying representative elements in massive data. In Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5--8, 2013, Lake Tahoe, Nevada, United States., pages 2049--2057, 2013.Google Scholar
G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher. An analysis of approximations for maximizing submodular set functions. Mathematical Programming, 14(1):265--294, 1978.Google ScholarDigital Library

Index Terms

Randomized Composable Core-sets for Distributed Submodular Maximization
1. Mathematics of computing
  1. Discrete mathematics
    1. Combinatorics
      1. Combinatorial algorithms
2. Theory of computation
  1. Design and analysis of algorithms

Recommendations

Bicriteria Distributed Submodular Maximization in a Few Rounds
SPAA '17: Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures

We study the problem of efficiently optimizing submodular functions under cardinality constraints in distributed setting. Recently, several distributed algorithms for this problem have been introduced which either achieve a sub-optimal solution or they ...
Read More
Randomized Composable Coresets for Matching and Vertex Cover
SPAA '17: Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures

A common approach for designing scalable algorithms for massive data sets is to distribute the computation across, say k, machines and process the data using limited communication between them. A particularly appealing framework here is the simultaneous ...
Read More
Improved Deterministic Algorithms for Non-monotone Submodular Maximization
Computing and Combinatorics
Abstract
Submodular maximization is one of the central topics in combinatorial optimization. It has found numerous applications in the real world. In the past decades, a series of algorithms have been proposed for this problem. However, most of the state-...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
STOC '15: Proceedings of the forty-seventh annual ACM symposium on Theory of Computing
June 2015
916 pages
ISBN:9781450335362
DOI:10.1145/2746539
General Chair:
Rocco Servedio
Columbia University
,
Program Chair:
Ronitt Rubinfeld
MIT and Tel Aviv University
Copyright © 2015 Owner/Author
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 June 2015
Check for updates
Author Tags
core-sets
distributed algorithms
mapreduce algorithms
randomized composable core-sets
streaming algorithms
submodular maximization
Qualifiers
- research-article
Conference

Acceptance Rates
STOC '15 Paper Acceptance Rate93of347submissions,27%Overall Acceptance Rate1,469of4,586submissions,32%
More
Upcoming Conference
STOC '24

Sponsor:

sigact

56th Annual ACM Symposium on Theory of Computing (STOC 2024)

June 24 - 28, 2024

Vancouver , BC , Canada
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 43
  Total Citations
  View Citations
- 1,358
  Total Downloads
- Downloads (Last 12 months)77
- Downloads (Last 6 weeks)19
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Randomized Composable Core-sets for Distributed Submodular Maximization

STOC '15: Proceedings of the forty-seventh annual ACM symposium on Theory of Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Bicriteria Distributed Submodular Maximization in a Few Rounds

Randomized Composable Coresets for Matching and Vertex Cover

Improved Deterministic Algorithms for Non-monotone Submodular Maximization

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Randomized Composable Core-sets for Distributed Submodular Maximization

STOC '15: Proceedings of the forty-seventh annual ACM symposium on Theory of Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Bicriteria Distributed Submodular Maximization in a Few Rounds

Randomized Composable Coresets for Matching and Vertex Cover

Improved Deterministic Algorithms for Non-monotone Submodular Maximization

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media