skip to main content
10.1145/2746539.2746624acmconferencesArticle/Chapter ViewAbstractPublication PagesstocConference Proceedingsconference-collections
research-article
Open Access

Randomized Composable Core-sets for Distributed Submodular Maximization

Published:14 June 2015Publication History

ABSTRACT

An effective technique for solving optimization problems over massive data sets is to partition the data into smaller pieces, solve the problem on each piece and compute a representative solution from it, and finally obtain a solution inside the union of the representative solutions for all pieces. This technique can be captured via the concept of composable core-sets, and has been recently applied to solve diversity maximization problems as well as several clustering problems [7,15,8]. However, for coverage and submodular maximization problems, impossibility bounds are known for this technique [15]. In this paper, we focus on efficient construction of a randomized variant of composable core-sets where the above idea is applied on a random clustering of the data. We employ this technique for the coverage, monotone and non-monotone submodular maximization problems. Our results significantly improve upon the hardness results for non-randomized core-sets, and imply improved results for submodular maximization in a distributed and streaming settings. The effectiveness of this technique has been confirmed empirically for several machine learning applications [22], and our proof provides a theoretical foundation to this idea.

In summary, we show that a simple greedy algorithm results in a 1/3-approximate randomized composable core-set for submodular maximization under a cardinality constraint. Our result also extends to non-monotone submodular functions, and leads to the first 2-round MapReduce-based constant-factor approximation algorithm with O(n) total communication complexity for either monotone or non-monotone functions. Finally, using an improved analysis technique and a new algorithm PseudoGreedy, we present an improved 0.545-approximation algorithm for monotone submodular maximization, which is in turn the first MapReduce-based algorithm beating factor 1/2 in a constant number of rounds.

References

  1. S. Abbar, S. Amer-Yahia, P. Indyk, S. Mahabadi, and K. R. Varadarajan. Diverse near neighbor problem. In Symposuim on Computational Geometry 2013, SoCG '13, Rio de Janeiro, Brazil, June 17--20, 2013, pages 207--214, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. P. K. Agarwal, G. Cormode, Z. Huang, J. Phillips, Z. Wei, and K. Yi. Mergeable summaries. In Proceedings of the 31st symposium on Principles of Database Systems, pages 23--34. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. P. K. Agarwal, S. Har-Peled, and K. R. Varadarajan. Approximating extent measures of points. Journal of the ACM (JACM), 51(4):606--635, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. Andoni, A. Nikolov, K. Onak, and G. Yaroslavtsev. Parallel algorithms for geometric graph problems. In Symposium on Theory of Computing, STOC 2014, New York, NY, USA, May 31 - June 03, 2014, pages 574--583, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. Badanidiyuru, B. Mirzasoleiman, A. Karbasi, and A. Krause. Streaming submodular maximization: Massive data summarization on the fly. In KDD, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Badanidiyuru and J. Vondrak. Fast algorithms for maximizing submodular functions. In SODA, pages 1497--1514, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M.-F. Balcan, S. Ehrlich, and Y. Liang. Distributed clustering on graphs. In NIPS, page to appear, 2013.Google ScholarGoogle Scholar
  8. M. Bateni, A. Bhashkara, S. Lattanzi, and V. Mirrokni. Mapping core-sets for balanced clustering. In Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5--8, 2013, Lake Tahoe, Nevada, United States., 2014.Google ScholarGoogle Scholar
  9. G. E. Blelloch, H. V. Simhadri, and K. Tangwongsan. Parallel and i/o efficient set covering algorithms. In SPAA, pages 82--90, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. N. Buchbinder, M. Feldman, J. S. Naor, and R. Schwartz. Submodular maximization with cardinality constraints. In Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA '14, pages 1433--1452. SIAM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. F. Chierichetti, R. Kumar, and A. Tomkins. Max-cover in map-reduce. In WWW, pages 231--240, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. G. Cormode, H. J. Karloff, and A. Wirth. Set cover algorithms for very large datasets. In CIKM, pages 479--488, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. In OSDI, pages 137--150, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. Guha, N. Mishra, R. Motwani, and L. O'Callaghan. Clustering data streams. STOC, 2001.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. P. Indyk, S. Mahabadi, M. Mahdian, and V. Mirrokni. Composable core-sets for diversity and coverage maximization. In ACM PODS, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. Kapralov, S. Khanna, and M. Sudan. Approximating matching size from random streams. In SODA, pages 734--751, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. H. J. Karloff, S. Suri, and S. Vassilvitskii. A model of computation for mapreduce. In SODA, pages 938--948, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. R. kiveris, S. Lattanzi, V. Mirrokni, V. Rastogi, and S. Vasilvitski. Connected components in mapreduce and beyond. In ACM SOCC, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. N. Korula, V. Mirrokni, and M. Zadimoghaddam. Online submodular welfare maximization: Greedy beats 1/2 in random order. In to appear STOC, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. R. Kumar, B. Moseley, S. Vassilvitskii, and A. Vattani. Fast greedy algorithms in mapreduce and streaming. In SPAA, pages 1--10, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S. Lattanzi, B. Moseley, S. Suri, and S. Vassilvitskii. Filtering: a method for solving graph problems in mapreduce. In SPAA, pages 85--94, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. B. Mirzasoleiman, A. Karbasi, R. Sarkar, and A. Krause. Distributed submodular maximization: Identifying representative elements in massive data. In Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5--8, 2013, Lake Tahoe, Nevada, United States., pages 2049--2057, 2013.Google ScholarGoogle Scholar
  23. G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher. An analysis of approximations for maximizing submodular set functions. Mathematical Programming, 14(1):265--294, 1978.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Randomized Composable Core-sets for Distributed Submodular Maximization

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        STOC '15: Proceedings of the forty-seventh annual ACM symposium on Theory of Computing
        June 2015
        916 pages
        ISBN:9781450335362
        DOI:10.1145/2746539

        Copyright © 2015 Owner/Author

        Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 14 June 2015

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        STOC '15 Paper Acceptance Rate93of347submissions,27%Overall Acceptance Rate1,469of4,586submissions,32%

        Upcoming Conference

        STOC '24
        56th Annual ACM Symposium on Theory of Computing (STOC 2024)
        June 24 - 28, 2024
        Vancouver , BC , Canada

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader