skip to main content
research-article

Two birds, one stone: a fast, yet lightweight, indexing scheme for modern database systems

Published:01 November 2016Publication History
Skip Abstract Section

Abstract

Classic database indexes (e.g., B+-Tree), though speed up queries, suffer from two main drawbacks: (1) An index usually yields 5% to 15% additional storage overhead which results in non-ignorable dollar cost in big data scenarios especially when deployed on modern storage devices. (2) Maintaining an index incurs high latency because the DBMS has to locate and update those index pages affected by the underlying table changes. This paper proposes Hippo a fast, yet scalable, database indexing approach. It significantly shrinks the index storage and mitigates maintenance overhead without compromising much on the query execution performance. Hippo stores disk page ranges instead of tuple pointers in the indexed table to reduce the storage space occupied by the index. It maintains simplified histograms that represent the data distribution and adopts a page grouping technique that groups contiguous pages into page ranges based on the similarity of their index key attribute distributions. When a query is issued, Hippo leverages the page ranges and histogram-based page summaries to recognize those pages such that their tuples are guaranteed not to satisfy the query predicates and inspects the remaining pages. Experiments based on real and synthetic datasets show that Hippo occupies up to two orders of magnitude less storage space than that of the B+-Tree while still achieving comparable query execution performance to that of the B+-Tree for 0.1% -- 1% selectivity factors. Also, the experiments show that Hippo outperforms BRIN (Block Range Index) in executing queries with various selectivity factors. Furthermore, Hippo achieves up to three orders of magnitude less maintenance overhead and up to an order of magnitude higher throughput (for hybrid query/update workloads) than its counterparts.

References

  1. New york city taxi and limousine commission. http://www.nyc.gov/html/tlc/html/about/trip\_record\_data.html.Google ScholarGoogle Scholar
  2. Page view statistics for wikimedia projects. https://dumps.wikimedia.org/other/pagecounts-raw/.Google ScholarGoogle Scholar
  3. S. Agarwal, H. Milner, A. Kleiner, A. Talwalkar, M. Jordan, S. Madden, B. Mozafari, and I. Stoica. Knowing when you're wrong: building fast and reliable approximate query processing systems. In Proceedings of the International Conference on Management of Data, SIGMOD, pages 481--492. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Athanassoulis and A. Ailamaki. Bf-tree: Approximate tree indexing. In Proceedings of the International Conference on Very Large Data Bases, VLDB, pages 1881--1892. VLDB Endowment, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. Bontempo and G. Zagelow. The ibm data warehouse architecture. The Communications of the ACM, 41(9):38--48, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. T. P. P. Council. Tpc-h benchmark specification. http://www.tcp.org/hspec.html, 2008.Google ScholarGoogle Scholar
  7. P. Flajolet, D. Gardy, and L. Thimonier. Birthday paradox, coupon collectors, caching algorithms and self-organizing search. Discrete Applied Mathematics, 39(3):207--229, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. F. Fusco, M. P. Stoecklin, and M. Vlachos. Net-fli: on-the-fly compression, archiving and indexing of streaming network traffic. VLDB Journal, 3(1--2):1382--1393, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Goldstein, R. Ramakrishnan, and U. Shaft. Compressing relations and indexes. In Proceedings of the International Conference on Data Engineering, ICDE, pages 370--379. IEEE, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. G. Guzun, G. Canahuate, D. Chiu, and J. Sawin. A tunable compression framework for bitmap indices. In Proceedings of the International Conference on Data Engineering, ICDE, pages 484--495. IEEE, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  11. M. E. Houle and J. Sakuma. Fast approximate similarity search in extremely high-dimensional data sets. In Proceedings of the International Conference on Data Engineering, ICDE, pages 619--630. IEEE, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. D. Lemire, O. Kaser, and K. Aouiche. Sorting improves word-aligned bitmap indexes. Data & Knowledge Engineering, 69(1):3--28, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. G. Moerkotte. Small materialized aggregates: A light weight index structure for data warehousing. In Proceedings of the International Conference on Very Large Data Bases, VLDB, pages 476--487. VLDB Endowment, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Y. Sakurai, M. Yoshikawa, S. Uemura, H. Kojima, et al. The a-tree: An index structure for high-dimensional spaces using relative approximation. In Proceedings of the International Conference on Very Large Data Bases, VLDB, pages 5--16. VLDB Endowment, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. L. Sidirourgos and M. L. Kersten. Column imprints: a secondary index structure. In Proceedings of the International Conference on Management of Data, SIGMOD, pages 893--904. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. K. Stockinger and K. Wu. Bitmap indices for data warehouses. Data Warehouses and OLAP: Concepts, Architectures and Solutions, page 57, 2006.Google ScholarGoogle Scholar
  17. M. Stonebraker and L. A. Rowe. The design of postgres. In Proceedings of the International Conference on Management of Data, SIGMOD. ACM, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. R. Weiss. A technical overview of the oracle exadata database machine and exadata storage server. Oracle White Paper. Oracle Corporation, Redwood Shores, 2012.Google ScholarGoogle Scholar
  19. K. Wu, E. Otoo, and A. Shoshani. On the performance of bitmap indices for high cardinality attributes. In Proceedings of the International Conference on Very Large Data Bases, VLDB, pages 24--35. VLDB Endowment, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. K. Zeng, S. Gao, B. Mozafari, and C. Zaniolo. The analytical bootstrap: a new method for fast error estimation in approximate query processing. In Proceedings of the International Conference on Management of Data, SIGMOD, pages 277--288. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. Zukowski, S. Heman, N. Nes, and P. Boncz. Super-scalar ram-cpu cache compression. In Proceedings of the International Conference on Data Engineering, ICDE, pages 59--59. IEEE, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 10, Issue 4
    November 2016
    180 pages
    ISSN:2150-8097
    Issue’s Table of Contents

    Publisher

    VLDB Endowment

    Publication History

    • Published: 1 November 2016
    Published in pvldb Volume 10, Issue 4

    Qualifiers

    • research-article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader