skip to main content
10.1145/2588555.2610512acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Opportunistic physical design for big data analytics

Published:18 June 2014Publication History

ABSTRACT

Big data analytical systems, such as MapReduce, perform aggressive materialization of intermediate job results in order to support fault tolerance. When jobs correspond to exploratory queries submitted by data analysts, these materializations yield a large set of materialized views that we propose to treat as an opportunistic physical design. We present a semantic model for UDFs that enables effective reuse of views containing UDFs along with a rewrite algorithm that provably finds the minimum-cost rewrite under certain assumptions. An experimental study on real-world datasets using our prototype based on Hive shows that our approach can result in dramatic performance improvements.

References

  1. S. Agrawal, S. Chaudhuri, and V. Narasayya. Automated selection of materialized views and indexes in SQL databases. In VLDB, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Chandra and P. M. Merlin. Optimal implementation of conjunctive queries in relational databases. In STOC, 1977. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Y. Chen, S. Alspaugh, and R. Katz. Interactive analytical processing in big data systems: A cross-industry study of MapReduce workloads. PVLDB, 5(12), 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Y. Chen, A. Ganapathi, R. Griffith, and R. Katz. The case for evaluating MapReduce performance using workload suites. In MASCOTS, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. DataFu. http://data.linkedin.com/opensource/datafu.Google ScholarGoogle Scholar
  6. I. Elghandour and A. Aboulnaga. ReStore: Reusing results of MapReduce jobs. PVLDB, 5(6), 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Goldstein and P.-A. Larson. Optimizing queries using materialized views: A practical, scalable solution. In SIGMOD, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Grumbach and L. Tininini. On the content of materialized aggregate views. In PODS, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. H. Hacigümüs, J. Sankaranarayanan, J. Tatemura, J. LeFevre, and N. Polyzotis. Odyssey: A multi-store system for evolutionary analytics. PVLDB, 6(11), 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Halevy. Answering queries using views: A survey. VLDBJ, 10(4), 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. M. Hellerstein, C. Ré, F. Schoppmann, D. Z. Wang, E. Fratkin, A. Gorajek, K. S. Ng, C. Welton, X. Feng, K. Li, et al. The MADlib analytics library: or MAD skills, the SQL. PVLDB, 5(12), 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. G. Hjaltason and H. Samet. Index-driven similarity search in metric spaces. TODS, 28(4), 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. F. Hueske, M. Peters, M. J. Sax, A. Rheinlander, R. Bergmann, A. Krettek, and K. Tzoumas. Opening the black boxes in data flow optimization. PVLDB, 5(11), 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. N. Khoussainova, Y. Kwon, W.-T. Liao, M. Balazinska, W. Gatterbauer, and D. Suciu. Session-based browsing for more effective query reuse. In SSDBM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. G. Konstantinidis and J. L. Ambite. Scalable query rewriting: a graph-based approach. In SIGMOD, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. LeFevre, J. Sankaranarayanan, H. Hacıgümüş, J. Tatemura, and N. Polyzotis. Towards a workload for evolutionary analytics. In SIGMOD Workshop on Data Analytics in the Cloud (DanaC), 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. LeFevre, J. Sankaranarayanan, H. Hacıgümüş, J. Tatemura, N. Polyzotis, and M. J. Carey. Exploiting opportunistic physical design in large-scale data analytics. CoRR, abs/1303.6609, 2013.Google ScholarGoogle Scholar
  18. J. LeFevre, J. Sankaranarayanan, H. Hacıgümüş, J. Tatemura, N. Polyzotis, and M. J. Carey. MISO: Souping up big data query processing with a multistore system. In SIGMOD, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. Y. Levy, A. O. Mendelzon, and Y. Sagiv. Answering queries using views (extended abstract). In PODS, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. B. Li, E. Mazur, Y. Diao, A. McGregor, and P. Shenoy. A platform for scalable one-pass analytics using MapReduce. In SIGMOD, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. T. Nykiel, M. Potamias, C. Mishra, G. Kollios, and N. Koudas. MRShare: Sharing across multiple queries in MapReduce. PVLDB, 3(1--2), 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. PiggyBank. https://wiki.apache.org/pig/PiggyBank.Google ScholarGoogle Scholar
  23. R. Pottinger and A. Halevy. MiniCon: A scalable algorithm for answering queries using views. VLDBJ, 10(2), 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. K. Ren, Y. Kwon, M. Balazinska, and B. Howe. Hadoop's adolescence: An analysis of Hadoop usage in scientific workloads. PVLDB, 6(10), 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. K. Schnaitter, S. Abiteboul, T. Milo, and N. Polyzotis. On-line index selection for shifting workloads. In ICDE, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. T. Sellis. Multiple-query optimization. TODS, 13(1), 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. A. Simitsis, K. Wilkinson, M. Castellanos, and U. Dayal. Optimizing analytic data flows for multiple execution engines. In SIGMOD, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. R. S. Xin, J. Rosen, M. Zaharia, M. J. Franklin, S. Shenker, and I. Stoica. Shark: SQL and rich analytics at scale. In SIGMOD, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. M. Zaharioudakis, R. Cochrane, G. Lapis, H. Pirahesh, and M. Urata. Answering complex SQL queries using automatic summary tables. In SIGMOD, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Opportunistic physical design for big data analytics

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data
            June 2014
            1645 pages
            ISBN:9781450323765
            DOI:10.1145/2588555

            Copyright © 2014 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 18 June 2014

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            SIGMOD '14 Paper Acceptance Rate107of421submissions,25%Overall Acceptance Rate785of4,003submissions,20%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader