research-article

Two birds, one stone: a fast, yet lightweight, indexing scheme for modern database systems

Authors:
Jia Yu

Arizona State University

Arizona State University
View Profile

,
Mohamed Sarwat

Arizona State University

Arizona State University
View Profile

Proceedings of the VLDB Endowment Volume 10 Issue 4pp 385–396https://doi.org/10.14778/3025111.3025120

Published:01 November 2016Publication History

Proceedings of the VLDB Endowment

Abstract

Classic database indexes (e.g., B⁺-Tree), though speed up queries, suffer from two main drawbacks: (1) An index usually yields 5% to 15% additional storage overhead which results in non-ignorable dollar cost in big data scenarios especially when deployed on modern storage devices. (2) Maintaining an index incurs high latency because the DBMS has to locate and update those index pages affected by the underlying table changes. This paper proposes Hippo a fast, yet scalable, database indexing approach. It significantly shrinks the index storage and mitigates maintenance overhead without compromising much on the query execution performance. Hippo stores disk page ranges instead of tuple pointers in the indexed table to reduce the storage space occupied by the index. It maintains simplified histograms that represent the data distribution and adopts a page grouping technique that groups contiguous pages into page ranges based on the similarity of their index key attribute distributions. When a query is issued, Hippo leverages the page ranges and histogram-based page summaries to recognize those pages such that their tuples are guaranteed not to satisfy the query predicates and inspects the remaining pages. Experiments based on real and synthetic datasets show that Hippo occupies up to two orders of magnitude less storage space than that of the B⁺-Tree while still achieving comparable query execution performance to that of the B⁺-Tree for 0.1% -- 1% selectivity factors. Also, the experiments show that Hippo outperforms BRIN (Block Range Index) in executing queries with various selectivity factors. Furthermore, Hippo achieves up to three orders of magnitude less maintenance overhead and up to an order of magnitude higher throughput (for hybrid query/update workloads) than its counterparts.

References

New york city taxi and limousine commission. http://www.nyc.gov/html/tlc/html/about/trip\_record\_data.html.Google Scholar
Page view statistics for wikimedia projects. https://dumps.wikimedia.org/other/pagecounts-raw/.Google Scholar
S. Agarwal, H. Milner, A. Kleiner, A. Talwalkar, M. Jordan, S. Madden, B. Mozafari, and I. Stoica. Knowing when you're wrong: building fast and reliable approximate query processing systems. In Proceedings of the International Conference on Management of Data, SIGMOD, pages 481--492. ACM, 2014. Google ScholarDigital Library
M. Athanassoulis and A. Ailamaki. Bf-tree: Approximate tree indexing. In Proceedings of the International Conference on Very Large Data Bases, VLDB, pages 1881--1892. VLDB Endowment, 2014. Google ScholarDigital Library
C. Bontempo and G. Zagelow. The ibm data warehouse architecture. The Communications of the ACM, 41(9):38--48, 1998. Google ScholarDigital Library
T. P. P. Council. Tpc-h benchmark specification. http://www.tcp.org/hspec.html, 2008.Google Scholar
P. Flajolet, D. Gardy, and L. Thimonier. Birthday paradox, coupon collectors, caching algorithms and self-organizing search. Discrete Applied Mathematics, 39(3):207--229, 1992. Google ScholarDigital Library
F. Fusco, M. P. Stoecklin, and M. Vlachos. Net-fli: on-the-fly compression, archiving and indexing of streaming network traffic. VLDB Journal, 3(1--2):1382--1393, 2010. Google ScholarDigital Library
J. Goldstein, R. Ramakrishnan, and U. Shaft. Compressing relations and indexes. In Proceedings of the International Conference on Data Engineering, ICDE, pages 370--379. IEEE, 1998. Google ScholarDigital Library
G. Guzun, G. Canahuate, D. Chiu, and J. Sawin. A tunable compression framework for bitmap indices. In Proceedings of the International Conference on Data Engineering, ICDE, pages 484--495. IEEE, 2014.Google ScholarCross Ref
M. E. Houle and J. Sakuma. Fast approximate similarity search in extremely high-dimensional data sets. In Proceedings of the International Conference on Data Engineering, ICDE, pages 619--630. IEEE, 2005. Google ScholarDigital Library
D. Lemire, O. Kaser, and K. Aouiche. Sorting improves word-aligned bitmap indexes. Data & Knowledge Engineering, 69(1):3--28, 2010. Google ScholarDigital Library
G. Moerkotte. Small materialized aggregates: A light weight index structure for data warehousing. In Proceedings of the International Conference on Very Large Data Bases, VLDB, pages 476--487. VLDB Endowment, 1998. Google ScholarDigital Library
Y. Sakurai, M. Yoshikawa, S. Uemura, H. Kojima, et al. The a-tree: An index structure for high-dimensional spaces using relative approximation. In Proceedings of the International Conference on Very Large Data Bases, VLDB, pages 5--16. VLDB Endowment, 2000. Google ScholarDigital Library
L. Sidirourgos and M. L. Kersten. Column imprints: a secondary index structure. In Proceedings of the International Conference on Management of Data, SIGMOD, pages 893--904. ACM, 2013. Google ScholarDigital Library
K. Stockinger and K. Wu. Bitmap indices for data warehouses. Data Warehouses and OLAP: Concepts, Architectures and Solutions, page 57, 2006.Google Scholar
M. Stonebraker and L. A. Rowe. The design of postgres. In Proceedings of the International Conference on Management of Data, SIGMOD. ACM, 1986. Google ScholarDigital Library
R. Weiss. A technical overview of the oracle exadata database machine and exadata storage server. Oracle White Paper. Oracle Corporation, Redwood Shores, 2012.Google Scholar
K. Wu, E. Otoo, and A. Shoshani. On the performance of bitmap indices for high cardinality attributes. In Proceedings of the International Conference on Very Large Data Bases, VLDB, pages 24--35. VLDB Endowment, 2004. Google ScholarDigital Library
K. Zeng, S. Gao, B. Mozafari, and C. Zaniolo. The analytical bootstrap: a new method for fast error estimation in approximate query processing. In Proceedings of the International Conference on Management of Data, SIGMOD, pages 277--288. ACM, 2014. Google ScholarDigital Library
M. Zukowski, S. Heman, N. Nes, and P. Boncz. Super-scalar ram-cpu cache compression. In Proceedings of the International Conference on Data Engineering, ICDE, pages 59--59. IEEE, 2006. Google ScholarDigital Library

Recommendations

Two Birds With One Stone: Boosting Both Search and Write Performance for Tree Indices on Persistent Memory
Special Issue ESWEEK 2021, CASES 2021, CODES+ISSS 2021 and EMSOFT 2021
The advance of byte-addressable persistent memory (PM) makes it a hot topic to revisit traditional tree indices such as B+-tree and radix tree, and a few new persistent memory-friendly tree indices have been proposed. However, due to the special features ...
Read More
One Way Distance: For Shape Based Similarity Search of Moving Object Trajectories

An interesting issue in moving object databases is to find similar trajectories of moving objects. Previous work on this topic focuses on movement patterns (trajectories with time dimension) of moving objects, rather than spatial shapes (trajectories ...
Read More
One Seed, Two Birds: A Unified Learned Structure for Exact and Approximate Counting
PACMMOD

The modern database has many precise and approximate counting requirements. Nevertheless, a solitary multidimensional index or cardinality estimator is insufficient to cater to the escalating demands across all counting scenarios. Such approaches are ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Proceedings of the VLDB Endowment Volume 10, Issue 4
November 2016
180 pages
ISSN:2150-8097
Editors:
Peter Boncz
CWI
,
Ken Salem
University of Waterloo
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 November 2016
Published in pvldb Volume 10, Issue 4
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 12
  Total Citations
  View Citations
- 271
  Total Downloads
- Downloads (Last 12 months)17
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Two birds, one stone: a fast, yet lightweight, indexing scheme for modern database systems

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Recommendations

Two Birds With One Stone: Boosting Both Search and Write Performance for Tree Indices on Persistent Memory

One Way Distance: For Shape Based Similarity Search of Moving Object Trajectories

One Seed, Two Birds: A Unified Learned Structure for Exact and Approximate Counting

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Two birds, one stone: a fast, yet lightweight, indexing scheme for modern database systems

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Recommendations

Two Birds With One Stone: Boosting Both Search and Write Performance for Tree Indices on Persistent Memory

One Way Distance: For Shape Based Similarity Search of Moving Object Trajectories

One Seed, Two Birds: A Unified Learned Structure for Exact and Approximate Counting

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media