skip to main content
10.1145/3035918.3064049acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article
Public Access

Access Path Selection in Main-Memory Optimized Data Systems: Should I Scan or Should I Probe?

Published:09 May 2017Publication History

ABSTRACT

The advent of columnar data analytics engines fueled a series of optimizations on the scan operator. New designs include column-group storage, vectorized execution, shared scans, working directly over compressed data, and operating using SIMD and multi-core execution. Larger main memories and deeper cache hierarchies increase the efficiency of modern scans, prompting a revisit of the question of access path selection.

In this paper, we compare modern sequential scans and secondary index scans. Through detailed analytical modeling and experimentation we show that while scans have become useful in more cases than before, both access paths are still useful, and so, access path selection (APS) is still required to achieve the best performance when considering variable workloads. We show how to perform access path selection. In particular, contrary to the way traditional systems choose between scans and secondary indexes, we find that in addition to the query selectivity, the underlying hardware, and the system design, modern optimizers also need to take into account query concurrency. We further discuss the implications of integrating access path selection in a modern analytical data system. We demonstrate, both theoretically and experimentally, that using the proposed model a system can quickly perform access path selection, outperforming solutions that rely on a single access path or traditional access path models. We outline a light-weight mechanism to integrate APS into main-memory analytical systems that does not interfere with low latency queries. We also use the APS model to explain how the division between sequential scan and secondary index scan has historically changed due to hardware and workload changes, which allows for future projections based on hardware advancements.

References

  1. D. J. Abadi, P. Boncz, S. Harizopoulos, S. Idreos, and S. Madden. The Design and Implementation of Modern Column-Oriented Database Systems. Found. Trends Databases, 5(3):197--280, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. J. Abadi, S. Madden, and M. Ferreira. Integrating Compression and Execution in Column-oriented Database Systems. In SIGMOD, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. D. J. Abadi, D. S. Myers, D. J. DeWitt, and S. R. Madden. Materialization Strategies in a Column-Oriented DBMS. In ICDE, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  4. I. Alagiannis, S. Idreos, and A. Ailamaki. H2O: A Hands-free Adaptive Store. In SIGMOD, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. R. Allen, K. Kennedy, C. Porterfield, and J. D. Warren. Conversion of Control Dependence to Data Dependence. In POPL, 1983. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. G. Antoshenkov. Dynamic Query Optimization in Rdb/VMS. In ICDE, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. Arumugam, A. Dobra, C. M. Jermaine, N. Pansare, and L. Perez. The DataPath System: A Data-centric Analytic Processing Engine for Large Data Warehouses. In SIGMOD, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. Athanassoulis. Solid-State Storage and Work Sharing for Efficient Scaleup Data Analytics. PhD thesis, Ecole Polytechnique Federale de Lausanne, 2014.Google ScholarGoogle Scholar
  9. M. Athanassoulis, M. S. Kester, L. M. Maas, R. Stoica, S. Idreos, A. Ailamaki, and M. Callaghan. Designing Access Methods: The RUM Conjecture. In EDBT, 2016.Google ScholarGoogle Scholar
  10. M. Athanassoulis, Z. Yan, and S. Idreos. UpBit: Scalable In-Memory Updatable Bitmap Indexing. In SIGMOD, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. I. August, W.-m. W. Hwu, and S. A. Mahlke. A Framework for Balancing Control Flow and Predication. In MICRO, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. R. Barber, P. Bendel, M. Czech, O. Draese, F. Ho, N. Hrle, S. Idreos, M.-S. Kim, O. Koeth, J.-G. Lee, T. T. Li, G. M. Lohman, K. Morfonios, R. Müller, K. Murthy, I. Pandis, L. Qiao, V. Raman, R. Sidle, K. Stolze, and S. Szabo. Business Analytics in (a) Blink. IEEE DEBULL, 35(1):9--14, 2012.Google ScholarGoogle Scholar
  13. R. Barber, G. Lohman, V. Raman, R. Sidle, S. Lightstone, and B. Schiefer. In-Memory BLU Acceleration in IBM's DB2 and dashDB: Optimized for Modern Workloads and Hardware Architectures. In ICDE, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  14. C. Binnig, S. Hildenbrand, and F. Färber. Dictionary-based Order-preserving String Compression for Main Memory Column Stores. In SIGMOD, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. P. Boncz, M. Zukowski, and N. J. Nes. MonetDB/X100: Hyper-Pipelining Query Execution. In CIDR, 2005.Google ScholarGoogle Scholar
  16. P. A. Boncz, M. L. Kersten, and S. Manegold. Breaking the Memory Wall in MonetDB. CACM, 51(12):77--85, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. R. Borovica-Gajic, S. Idreos, A. Ailamaki, M. Zukowski, and C. Fraser. Smooth Scan: Statistics-Oblivious Access Paths. In ICDE, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  18. G. Candea, N. Polyzotis, and R. Vingralek. Predictable Performance and High Query Concurrency for Data Analytics. VLDBJ, 20(2):227--248, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. C. Chasseur and J. M. Patel. Design and Evaluation of Storage Organizations for Read-Optimized Main Memory Databases. PVLDB, 6(13):1474--1485, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. S. Chaudhuri. An Overview of Query Optimization in Relational Systems. In PODS, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S. Chaudhuri. Query Optimizers: Time to Rethink the Contract? In SIGMOD, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. S. Chaudhuri and V. R. Narasayya. An Efficient Cost-Driven Index Selection Tool for Microsoft SQL Server. In VLDB, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. Christodoulakis. Implications of Certain Assumptions in Database Performance Evaluation. TODS, 9(2):163--186, 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. D. J. DeWitt, R. H. Katz, F. Olken, L. D. Shapiro, M. Stonebraker, and D. A. Wood. Implementation Techniques for Main Memory Database Systems. In SIGMOD, 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. F. Färber, N. May, W. Lehner, P. Große, I. Müller, H. Rauhe, and J. Dees. The SAP HANA Database -- An Architecture Overview. IEEE DEBULL, 35(1):28--33, 2012.Google ScholarGoogle Scholar
  26. P. Francisco. The Netezza Data Appliance Architecture: A Platform for High Performance Data Warehousing and Analytics. IBM Redbooks, 2011.Google ScholarGoogle Scholar
  27. G. Giannikis, G. Alonso, and D. Kossmann. SharedDB: Killing One Thousand Queries with One Stone. PVLDB, 5(6):526--537, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. G. Giannikis, D. Makreshanski, G. Alonso, and D. Kossmann. Shared Workload Optimization. PVLDB, 7(6):429--440, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. G. Graefe. The Five-Minute Rule Twenty Years Later. In DAMON, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. G. Graefe. Modern B-Tree Techniques. Found. Trends Databases, 3(4):203--402, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. G. Graefe and L. D. Shapiro. Data Compression and Database Performance. In SAC, 1991.Google ScholarGoogle ScholarCross RefCross Ref
  32. J. Gray and F. Putzolu. The 5 Minute Rule for Trading Memory for Disc Accesses and the 5 Byte Rule for Trading Memory for CPU Time. Tandem Computers - Technical Report, 1986.Google ScholarGoogle Scholar
  33. M. Grund, J. Krüger, H. Plattner, A. Zeier, P. Cudre-Mauroux, and S. Madden. HYRISE: A Main Memory Hybrid Storage Engine. PVLDB, 4(2):105--116, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. S. Harizopoulos, V. Shkapenyuk, and A. Ailamaki. QPipe: A Simultaneously Pipelined Relational Query Engine. In SIGMOD, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. S. Héman, M. Zukowski, and N. J. Nes. Positional Update Handling in Column Stores. In SIGMOD, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. M. Hong, M. Riedewald, C. Koch, J. Gehrke, and A. Demers. Rule-based Multi-Query Optimization. In EDBT, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. S. Idreos, F. Groffen, N. Nes, S. Manegold, K. S. Mullender, and M. L. Kersten. MonetDB: Two Decades of Research in Column-oriented Database Architectures. IEEE DEBULL, 35(1):40--45, 2012.Google ScholarGoogle Scholar
  38. S. Idreos, M. L. Kersten, and S. Manegold. Self-organizing Tuple Reconstruction in Column-Stores. In SIGMOD, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Intel. Online reference. https://software.intel.com/en-us/articles/intelr-memory-latency-checker.Google ScholarGoogle Scholar
  40. T. Jäkel, H. Voigt, T. Kissinger, and W. Lehner. Pack Indexing for Time-Constrained In-Memory Query Processing. In BTW, 2013.Google ScholarGoogle Scholar
  41. R. Johnson, S. Harizopoulos, N. Hardavellas, K. Sabirli, I. Pandis, A. Ailamaki, N. G. Mancheril, and B. Falsafi. To Share or Not to Share? In VLDB, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. R. Johnson, V. Raman, R. Sidle, and G. Swart. Row-wise Parallel Predicate Evaluation. PVLDB, 1(1):622--634, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. A. Kemper and T. Neumann. HyPer: A Hybrid OLTP&OLAP Main Memory Database System Based on Virtual Memory Snapshots. In ICDE, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. S. Khoshafian, G. P. Copeland, T. Jagodis, H. Boral, and P. Valduriez. A Query Processing Strategy for the Decomposed Storage Model. In ICDE, 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. C. Kim, J. Chhugani, N. Satish, E. Sedlar, A. D. Nguyen, T. Kaldewey, V. W. Lee, S. A. Brandt, and P. Dubey. FAST: Fast Architecture Sensitive Tree Search on Modern CPUs and GPUs. In SIGMOD, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. T. Kissinger, B. Schlegel, D. Habich, and W. Lehner. KISS-Tree: Smart Latch-Free In-Memory Indexing on Modern Architectures. In DAMON, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. J. Krüger, C. Kim, M. Grund, N. Satish, D. Schwalb, J. Chhugani, H. Plattner, P. Dubey, and A. Zeier. Fast Updates on Read-Optimized Databases Using Multi-Core CPUs. PVLDB, 5(1):61--72, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. T. Lahiri, S. Chavan, M. Colgan, D. Das, A. Ganesh, M. Gleeson, S. Hase, A. Holloway, J. Kamp, T.-H. Lee, J. Loaiza, N. Macnaughton, V. Marwah, N. Mukherjee, A. Mullick, S. Muthulingam, V. Raja, M. Roth, E. Soylemez, and M. Zait. Oracle Database In-Memory: A Dual Format In-Memory Database. In ICDE, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  49. A. Lamb, M. Fuller, and R. Varadarajan. The Vertica Analytic Database: C-Store 7 Years Later. PVLDB, 5(12):1790--1801, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. P.-A. Larson, E. N. Hanson, and M. Zwilling. Evolving the Architecture of SQL Server for Modern Hardware Trends. In ICDE, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  51. V. Leis, P. A. Boncz, A. Kemper, and T. Neumann. Morsel-driven Parallelism: A NUMA-aware Query Evaluation Framework for the Many-core Age. In SIGMOD, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. V. Leis, A. Kemper, and T. Neumann. The Adaptive Radix Tree: ARTful Indexing for Main-Memory Databases. In ICDE, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. J. J. Levandoski, D. B. Lomet, and S. Sengupta. The Bw-Tree: A B-tree for New Hardware Platforms. In ICDE, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Y. Li, C. Chasseur, and J. M. Patel. A Padded Encoding Scheme to Accelerate Scans by Leveraging Skew. In SIGMOD, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Y. Li and J. M. Patel. BitWeaving: Fast Scans for Main Memory Data Processing. In SIGMOD, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. S. Manegold, P. A. Boncz, and M. L. Kersten. Generic Database Cost Models for Hierarchical Memory Systems. In VLDB, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Y. Mao, E. Kohler, and R. T. Morris. Cache Craftiness for Fast Multicore Key-value Storage. In EuroSys, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. G. Moerkotte. Small Materialized Aggregates: A Light Weight Index Structure for Data Warehousing. In VLDB, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. T. Neumann. Efficiently Compiling Efficient Query Plans for Modern Hardware. PVLDB, 4(9):539--550, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. O. Polychroniou, A. Raghavan, and K. A. Ross. Rethinking SIMD Vectorization for In-Memory Databases. In SIGMOD, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. O. Polychroniou and K. A. Ross. A Comprehensive Study of Main-memory Partitioning and its Application to Large-scale Comparison- and Radix-sort. In SIGMOD, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. I. Psaroudakis, M. Athanassoulis, and A. Ailamaki. Sharing Data and Work Across Concurrent Analytical Queries. PVLDB, 6(9):637--648, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. L. Qiao, V. Raman, F. Reiss, P. J. Haas, and G. M. Lohman. Main-memory Scan Sharing for Multi-core CPUs. PVLDB, 1(1):610--621, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. V. Raman, G. M. Lohman, T. Malkemus, R. Mueller, I. Pandis, B. Schiefer, D. Sharpe, R. Sidle, A. Storm, L. Zhang, G. Attaluri, R. Barber, N. Chainani, D. Kalmuk, V. KulandaiSamy, J. Leenstra, S. Lightstone, and S. Liu. DB2 with BLU Acceleration: So Much More Than Just a Column Store. PVLDB, 6(11):1080--1091, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. J. Rao and K. A. Ross. Making B+-trees Cache Conscious in Main Memory. In SIGMOD, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. A. Rasin and S. Zdonik. An Automatic Physical Design Tool for Clustered Column-stores. In EDBT, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. P. Rösch, L. Dannecker, F. Färber, and G. Hackenbroich. A Storage Advisor for Hybrid-Store Databases. PVLDB, 5(12):1748--1758, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. P. Roy, S. Seshadri, S. Sudarshan, and S. Bhobe. Efficient and Extensible Algorithms for Multi Query Optimization. SIGMOD Rec., 29(2):249--260, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. P. Russom. High-Performance Data Warehousing. TDWI Best Practices Report, 2012.Google ScholarGoogle Scholar
  70. P. G. Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. Lorie, and T. G. Price. Access Path Selection in a Relational Database Management System. In SIGMOD, 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. T. K. Sellis. Multiple-Query Optimization. TODS, 13(1):23--52, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. T. K. Sellis and S. Ghosh. On the Multiple-query Optimization Problem. TKDE, 2(2):262--266, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. R. Sen, J. Chen, and N. Jimsheleishvilli. Query Optimization Time: The New Bottleneck in Real-time Analytics. In IMDM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. A. Shahvarani and H.-A. Jacobsen. A Hybrid B+-tree as Solution for In-Memory Indexing on CPU-GPU Heterogeneous Computing Platforms. In SIGMOD, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. N. Shamgunov. The MemSQL In-Memory Database System. In IMDM, 2014.Google ScholarGoogle Scholar
  76. L. Sidirourgos and M. L. Kersten. Column Imprints: A Secondary Index Structure. In SIGMOD, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. M. Stonebraker, D. J. Abadi, A. Batkin, X. Chen, M. Cherniack, M. Ferreira, E. Lau, A. Lin, S. R. Madden, E. J. O'Neil, P. E. O'Neil, A. Rasin, N. Tran, and S. Zdonik. C-Store: A Column-oriented DBMS. In VLDB, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. L. Sun, M. J. Franklin, S. Krishnan, and R. S. Xin. Fine-grained Partitioning for Aggressive Data Skipping. In SIGMOD, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. N. Tran, A. Lamb, L. Shrinivas, S. Bodagala, and J. Dave. The Vertica Query Optimizer: The Case for Specialized Query Optimizers. In ICDE, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  80. P. Unterbrunner, G. Giannikis, G. Alonso, D. Fauser, and D. Kossmann. Predictable Performance for Unpredictable Workloads. PVLDB, 2(1):706--717, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  81. T. Willhalm, N. Popovici, Y. Boshmaf, H. Plattner, A. Zeier, and J. Schaffner. SIMD-Scan: Ultra Fast in-Memory Table Scan using on-Chip Vector Processing Units. PVLDB, 2(1):385--394, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  82. R. S. Xin, J. Rosen, M. Zaharia, M. J. Franklin, S. Shenker, and I. Stoica. Shark: SQL and Rich Analytics at Scale. In SIGMOD, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. H. Zhang, D. G. Andersen, A. Pavlo, M. Kaminsky, L. Ma, and R. Shen. Reducing the Storage Overhead of Main-Memory OLTP Databases with Hybrid Indexes. In SIGMOD, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  84. J. Zhou and K. A. Ross. Implementing Database Operations Using SIMD Instructions. In SIGMOD, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  85. M. Zukowski and P. Boncz. Vectorwise: Beyond Column Stores. IEEE DEBULL, 35(1):21--27, 2012.Google ScholarGoogle Scholar
  86. M. Zukowski, P. A. Boncz, and S. Héman. MonetDB/X100 - A DBMS In The CPU Cache. IEEE DEBULL, 28(2):17--22, 2005.Google ScholarGoogle Scholar
  87. M. Zukowski, S. Héman, and P. A. Boncz. Architecture-conscious Hashing. In DAMON, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  88. M. Zukowski, S. Héman, N. J. Nes, and P. Boncz. Cooperative Scans: Dynamic Bandwidth Sharing in a DBMS. In VLDB, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  89. M. Zukowski, N. Nes, and P. A. Boncz. DSM vs. NSM: CPU Performance Tradeoffs in Block-oriented Query Processing. In DAMON, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Access Path Selection in Main-Memory Optimized Data Systems: Should I Scan or Should I Probe?

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGMOD '17: Proceedings of the 2017 ACM International Conference on Management of Data
        May 2017
        1810 pages
        ISBN:9781450341974
        DOI:10.1145/3035918

        Copyright © 2017 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 9 May 2017

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate785of4,003submissions,20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader