skip to main content
article

Optimizing bitmap indices with efficient compression

Authors Info & Claims
Published:01 March 2006Publication History
Skip Abstract Section

Abstract

Bitmap indices are efficient for answering queries on low-cardinality attributes. In this article, we present a new compression scheme called Word-Aligned Hybrid (WAH) code that makes compressed bitmap indices efficient even for high-cardinality attributes. We further prove that the new compressed bitmap index, like the best variants of the B-tree index, is optimal for one-dimensional range queries. More specifically, the time required to answer a one-dimensional range query is a linear function of the number of hits. This strongly supports the well-known observation that compressed bitmap indices are efficient for multidimensional range queries because results of one-dimensional range queries computed with bitmap indices can be easily combined to answer multidimensional range queries. Our timing measurements on range queries not only confirm the linear relationship between the query response time and the number of hits, but also demonstrate that WAH compressed indices answer queries faster than the commonly used indices including projection indices, B-tree indices, and other compressed bitmap indices.

References

  1. Amer-Yahia, S. and Johnson, T. 2000. Optimizing queries on compressed bitmaps. In VLDB 2000, Proceedings of 26th International Conference on Very Large Data Bases, September 10--14, 2000, Cairo, Egypt, A. E. Abbadi, M. L. Brodie, S. Chakravarthy, U. Dayal, N. Kamel, G. Schlageter, and K.-Y. Whang, Eds. Morgan Kaufmann, San Francisco, CA, 329--338.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Antoshenkov, G. 1994. Byte-aligned bitmap compression. Tech. rep. Oracle Corp., Redwood Shores, CA. U.S. Patent number 5,363,098.]]Google ScholarGoogle Scholar
  3. Antoshenkov, G. and Ziauddin, M. 1996. Query processing and optimization in ORACLE RDB. VLDB J. 5, 229--237.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Chan, C.-Y. and Ioannidis, Y. E. 1998. Bitmap index design and evaluation. In Proceedings of the 1998 ACM SIGMOD: International Conference on Management of Data. ACM Press, New York, NY, 355--366.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Chan, C. Y. and Ioannidis, Y. E. 1999. An efficient bitmap encoding scheme for selection queries. In SIGMOD 1999, Proceedings of the ACM SIGMOD International Conference on Management of Data, June 1--3, 1999, Philadelphia, Pennsylvania, USA, A. Delis, C. Faloutsos, and S. Ghandeharizadeh, Eds. ACM Press, New York, NY, 215--226.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Chaudhuri, S. and Dayal, U. 1997. An overview of data warehousing and OLAP technology. ACM SIGMOD Rec. 26, 1 (Mar.), 65--74.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Comer, D. 1979. The ubiquitous B-tree. Comput. Surv. 11, 2, 121--137.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Czech, Z. J. and Majewski, B. S. 1993. A linear time algorithm for finding minimal perfect hash functions. Comput. J. 36, 6 (Dec.), 579--587.]]Google ScholarGoogle ScholarCross RefCross Ref
  9. Fox, E. A., Chen, Q. F., Daoud, A. M., and Heath, L. S. 1991. Order-preserving minimal perfect hash functions and information retrieval. ACM Trans. Inf. Syst. 9, 3, 281--308.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Furuse, K., Asada, K., and Iizawa, A. 1995. Implementation and performance evaluation of compressed bit-sliced signature files. In Information Systems and Data Management, 6th International Conference, CISMOD'95, Bombay, India, November 15--17, 1995, Proceedings, S. Bhalla, Ed. Lecture Notes in Computer Science, vol. 1006. Springer, Berlin, Germany, 164--177.]]Google ScholarGoogle Scholar
  11. Gailly, J. and Adler, M. 1998. zlib 1.1.3 manual. Source code available online at http://-www.info-zip.org/pub/-infozip/zlib.]]Google ScholarGoogle Scholar
  12. Ishikawa, Y., Kitagawa, H., and Ohbo, N. 1993. Evalution of signature files as set access facilities in OODBs. In Proceedings of the ACM SIGMOD International Conference on Management of Data, (Washington, DC May 26--28), P. Buneman and S. Jajodia, Eds. ACM Press, New York, NY, 247--256.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Johnson, T. 1999. Performance measurements of compressed bitmap indices. In VLDB'99, Proceedings of 25th International Conference on Very Large Data Bases, September 7--10, 1999, Edinburgh, Scotland, UK, M. P. Atkinson, M. E. Orlowska, P. Valduriez, S. B. Zdonik, and M. L. Brodie, Eds. Morgan Kaufmann, San Francisco, CA, 278--289. A longer version appeared as AT&T report number AMERICA112.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Jürgens, M. and Lenz, H.-J. 1999. Tree based indexes vs. bitmap indexes---a performance study. In Proceedings of the International Workshop on Design and Management of Data Warehouses, DMDW'99, Heidelberg, Germany, June 14-15, 1999, S. Gatziu, M. A. Jeusfeld, M. Staudt, and Y. Vassiliou, Eds.]]Google ScholarGoogle Scholar
  15. Knuth, D. E. 1998. The Art of Computer Programming, 2nd ed. Vol. 3. Addison Wesley, Reading, MA.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Koudas, N. 2000. Space efficient bitmap indexing. In Proceedings of the Ninth International Conference on Information Knowledge Management (CIKM 2000, November 6--11, McLean, VA). ACM Press, New York, NY, 194--201.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Lee, D. L., Kim, Y. M., and Patel, G. 1995. Efficient signature file methods for text retrieval. IEEE Trans. Knowl. Data Eng. 7, 3, 423--435.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Moffat, A. and Zobel, J. 1992. Parameterised compression for sparse bitmaps. In Proceedings of the ACM-SIGIR International Conference on Research and Development in Information Retrieval, (Copenhagen, Denmark, June), N. Belkin, P. Ingwersen, and A. M. Pejtersen, Eds. ACM Press, New York, NY, 274--285.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. O'Neil, P. 1987. Model 204 architecture and performance. In 2nd International Workshop in High Performance Transaction Systems, Asilomar, CA. Lecture Notes in Computer Science, vol. 359. Springer-Verlag, Berlin, Germany, 40--59.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. O'Neil, P. and O'Neil, E. 2000. Database: Principles, Programming, and Performance, 2nd ed. Morgan Kaugmann, San Francisco, CA.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. O'Neil, P. and Quass, D. 1997. Improved query performance with variant indices. In Proceedings of the ACM SIGMOD International Conference on Management of Data (Tucson, AZ, May 13--15), J. Peckham, Ed. ACM Press, New York, NY, 38--49.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. O'Neil, P. E. and Graefe, G. 1995. Multi-table joins through bitmapped join indices. SIGMOD Rec. 24, 3, 8--11.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Shoshani, A., Bernardo, L. M., Nordberg, H., Rotem, D., and Sim, A. 1999. Multidimensional indexing and query coordination for tertiary storage management. In Proceedings of the 11th International Conference on Scientific and Statistical Database Management (Cleveland, OH, 28--30 July). IEEE Computer Society Press, Los Alamitos, CA, 214--225.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Stockinger, K., Duellmann, D., Hoschek, W., and Schikuta, E. 2000. Improving the performance of high-energy physics analysis through bitmap indices. In Proceedings of the 11th International Conference on Database and Expert Systems Applications (DEXA 2000, London, Greenwich, UK).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Stockinger, K., Wu, K., and Shoshani, A. 2002. Strategies for processing ad hoc queries on large data warehouses. In Proceedings of DOLAP'02 (McLean, VA), 72--79. A draft appeared as Tech rep. LBNL-51791, Lawrence Berkeley National Laboratory, Berkeley, CA.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Wong, H. K. T., Liu, H.-F., Olken, F., Rotem, D., and Wong, L. 1985. Bit transposed files. In Proceedings of VLDB 85 (Stockholm, Sweden). 448--457.]]Google ScholarGoogle Scholar
  27. Wu, K.-L. and Yu, P. 1996. Range-based bitmap indexing for high cardinality attributes with skew. Tech. rep. RC 20449. IBM Watson Research Division, Yorktown Heights, NY.]]Google ScholarGoogle Scholar
  28. Wu, M.-C. and Buchmann, A. P. 1998. Encoded bitmap indexing for data warehouses. In Proceedings of the Fourteenth International Conference on Data Engineering (February 23--27, Orlando, FL). IEEE Computer Society, ACM Press, Los Alamitos, CA, 220--230.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Wu, K., Koegler, W., Chen, J., and Shoshani, A. 2003. Using bitmap index for interactive exploration of large datasets. In Proceedings of SSDBM 2003 (Cambridge, MA), 65--74. A draft appeared as Tech rep. LBNL-52535, Lawrence Berkeley National Laboratory, Berkeley, CA.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Wu, K., Otoo, E. J., and Shoshani, A. 2001a. A performance comparison of bitmap indexes. In Proceedings of the 2001 ACM CIKM International Conference on Information and Knowledge Management (Atlanta, GA, November 5--10). ACM Press, New York, NY, 559--561.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Wu, K., Otoo, E. J., and Shoshani, A. 2002. Compressing bitmap indexes for faster search operations. In Proceedings of SSDBM'02 (Edinburgh, Scotland), 99--108. Also published as Tech rep. LBNL-49627, Lawrence Berkeley National Laboratory, Berkeley, CA.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Wu, K., Otoo, E. J., and Shoshani, A. 2004. On the performance of bitmap indices for high-cardinality attributes. In Proceedings of the Thirtieth International Conference on Very Large Data Bases, Toronto, Canada, August 31-September 3, 2004, M. A. Nascimento, M. T. Özsu, D. Kossmann, R. J. Miller, J. A. Blakeley, and K. B. Schiefer, Eds. Morgan Kaufmann, San Francisco, CA, 24--35.]]Google ScholarGoogle Scholar
  33. Wu, K., Otoo, E. J., Shoshani, A., and Nordberg, H. 2001b. Notes on design and implementation of compressed bit vectors. Tech. rep. LBNL/PUB-3161. Lawrence Berkeley National Laboratory, Berkeley, CA.]]Google ScholarGoogle Scholar
  34. Ziv, J. and Lempel, A. 1977. A universal algorithm for sequential data compression. IEEE Trans. Inform. Theor. 23, 3, 337--343.]]Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Optimizing bitmap indices with efficient compression

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Database Systems
          ACM Transactions on Database Systems  Volume 31, Issue 1
          March 2006
          438 pages
          ISSN:0362-5915
          EISSN:1557-4644
          DOI:10.1145/1132863
          Issue’s Table of Contents

          Copyright © 2006 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 1 March 2006
          Published in tods Volume 31, Issue 1

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader