Abstract
Classic database indexes (e.g., B+-Tree), though speed up queries, suffer from two main drawbacks: (1) An index usually yields 5% to 15% additional storage overhead which results in non-ignorable dollar cost in big data scenarios especially when deployed on modern storage devices. (2) Maintaining an index incurs high latency because the DBMS has to locate and update those index pages affected by the underlying table changes. This paper proposes Hippo a fast, yet scalable, database indexing approach. It significantly shrinks the index storage and mitigates maintenance overhead without compromising much on the query execution performance. Hippo stores disk page ranges instead of tuple pointers in the indexed table to reduce the storage space occupied by the index. It maintains simplified histograms that represent the data distribution and adopts a page grouping technique that groups contiguous pages into page ranges based on the similarity of their index key attribute distributions. When a query is issued, Hippo leverages the page ranges and histogram-based page summaries to recognize those pages such that their tuples are guaranteed not to satisfy the query predicates and inspects the remaining pages. Experiments based on real and synthetic datasets show that Hippo occupies up to two orders of magnitude less storage space than that of the B+-Tree while still achieving comparable query execution performance to that of the B+-Tree for 0.1% -- 1% selectivity factors. Also, the experiments show that Hippo outperforms BRIN (Block Range Index) in executing queries with various selectivity factors. Furthermore, Hippo achieves up to three orders of magnitude less maintenance overhead and up to an order of magnitude higher throughput (for hybrid query/update workloads) than its counterparts.
- New york city taxi and limousine commission. http://www.nyc.gov/html/tlc/html/about/trip\_record\_data.html.Google Scholar
- Page view statistics for wikimedia projects. https://dumps.wikimedia.org/other/pagecounts-raw/.Google Scholar
- S. Agarwal, H. Milner, A. Kleiner, A. Talwalkar, M. Jordan, S. Madden, B. Mozafari, and I. Stoica. Knowing when you're wrong: building fast and reliable approximate query processing systems. In Proceedings of the International Conference on Management of Data, SIGMOD, pages 481--492. ACM, 2014. Google ScholarDigital Library
- M. Athanassoulis and A. Ailamaki. Bf-tree: Approximate tree indexing. In Proceedings of the International Conference on Very Large Data Bases, VLDB, pages 1881--1892. VLDB Endowment, 2014. Google ScholarDigital Library
- C. Bontempo and G. Zagelow. The ibm data warehouse architecture. The Communications of the ACM, 41(9):38--48, 1998. Google ScholarDigital Library
- T. P. P. Council. Tpc-h benchmark specification. http://www.tcp.org/hspec.html, 2008.Google Scholar
- P. Flajolet, D. Gardy, and L. Thimonier. Birthday paradox, coupon collectors, caching algorithms and self-organizing search. Discrete Applied Mathematics, 39(3):207--229, 1992. Google ScholarDigital Library
- F. Fusco, M. P. Stoecklin, and M. Vlachos. Net-fli: on-the-fly compression, archiving and indexing of streaming network traffic. VLDB Journal, 3(1--2):1382--1393, 2010. Google ScholarDigital Library
- J. Goldstein, R. Ramakrishnan, and U. Shaft. Compressing relations and indexes. In Proceedings of the International Conference on Data Engineering, ICDE, pages 370--379. IEEE, 1998. Google ScholarDigital Library
- G. Guzun, G. Canahuate, D. Chiu, and J. Sawin. A tunable compression framework for bitmap indices. In Proceedings of the International Conference on Data Engineering, ICDE, pages 484--495. IEEE, 2014.Google ScholarCross Ref
- M. E. Houle and J. Sakuma. Fast approximate similarity search in extremely high-dimensional data sets. In Proceedings of the International Conference on Data Engineering, ICDE, pages 619--630. IEEE, 2005. Google ScholarDigital Library
- D. Lemire, O. Kaser, and K. Aouiche. Sorting improves word-aligned bitmap indexes. Data & Knowledge Engineering, 69(1):3--28, 2010. Google ScholarDigital Library
- G. Moerkotte. Small materialized aggregates: A light weight index structure for data warehousing. In Proceedings of the International Conference on Very Large Data Bases, VLDB, pages 476--487. VLDB Endowment, 1998. Google ScholarDigital Library
- Y. Sakurai, M. Yoshikawa, S. Uemura, H. Kojima, et al. The a-tree: An index structure for high-dimensional spaces using relative approximation. In Proceedings of the International Conference on Very Large Data Bases, VLDB, pages 5--16. VLDB Endowment, 2000. Google ScholarDigital Library
- L. Sidirourgos and M. L. Kersten. Column imprints: a secondary index structure. In Proceedings of the International Conference on Management of Data, SIGMOD, pages 893--904. ACM, 2013. Google ScholarDigital Library
- K. Stockinger and K. Wu. Bitmap indices for data warehouses. Data Warehouses and OLAP: Concepts, Architectures and Solutions, page 57, 2006.Google Scholar
- M. Stonebraker and L. A. Rowe. The design of postgres. In Proceedings of the International Conference on Management of Data, SIGMOD. ACM, 1986. Google ScholarDigital Library
- R. Weiss. A technical overview of the oracle exadata database machine and exadata storage server. Oracle White Paper. Oracle Corporation, Redwood Shores, 2012.Google Scholar
- K. Wu, E. Otoo, and A. Shoshani. On the performance of bitmap indices for high cardinality attributes. In Proceedings of the International Conference on Very Large Data Bases, VLDB, pages 24--35. VLDB Endowment, 2004. Google ScholarDigital Library
- K. Zeng, S. Gao, B. Mozafari, and C. Zaniolo. The analytical bootstrap: a new method for fast error estimation in approximate query processing. In Proceedings of the International Conference on Management of Data, SIGMOD, pages 277--288. ACM, 2014. Google ScholarDigital Library
- M. Zukowski, S. Heman, N. Nes, and P. Boncz. Super-scalar ram-cpu cache compression. In Proceedings of the International Conference on Data Engineering, ICDE, pages 59--59. IEEE, 2006. Google ScholarDigital Library
Recommendations
Two Birds With One Stone: Boosting Both Search and Write Performance for Tree Indices on Persistent Memory
Special Issue ESWEEK 2021, CASES 2021, CODES+ISSS 2021 and EMSOFT 2021The advance of byte-addressable persistent memory (PM) makes it a hot topic to revisit traditional tree indices such as B+-tree and radix tree, and a few new persistent memory-friendly tree indices have been proposed. However, due to the special features ...
One Way Distance: For Shape Based Similarity Search of Moving Object Trajectories
An interesting issue in moving object databases is to find similar trajectories of moving objects. Previous work on this topic focuses on movement patterns (trajectories with time dimension) of moving objects, rather than spatial shapes (trajectories ...
One Seed, Two Birds: A Unified Learned Structure for Exact and Approximate Counting
PACMMODThe modern database has many precise and approximate counting requirements. Nevertheless, a solitary multidimensional index or cardinality estimator is insufficient to cater to the escalating demands across all counting scenarios. Such approaches are ...
Comments