Abstract
Bitmap indices are efficient for answering queries on low-cardinality attributes. In this article, we present a new compression scheme called Word-Aligned Hybrid (WAH) code that makes compressed bitmap indices efficient even for high-cardinality attributes. We further prove that the new compressed bitmap index, like the best variants of the B-tree index, is optimal for one-dimensional range queries. More specifically, the time required to answer a one-dimensional range query is a linear function of the number of hits. This strongly supports the well-known observation that compressed bitmap indices are efficient for multidimensional range queries because results of one-dimensional range queries computed with bitmap indices can be easily combined to answer multidimensional range queries. Our timing measurements on range queries not only confirm the linear relationship between the query response time and the number of hits, but also demonstrate that WAH compressed indices answer queries faster than the commonly used indices including projection indices, B-tree indices, and other compressed bitmap indices.
- Amer-Yahia, S. and Johnson, T. 2000. Optimizing queries on compressed bitmaps. In VLDB 2000, Proceedings of 26th International Conference on Very Large Data Bases, September 10--14, 2000, Cairo, Egypt, A. E. Abbadi, M. L. Brodie, S. Chakravarthy, U. Dayal, N. Kamel, G. Schlageter, and K.-Y. Whang, Eds. Morgan Kaufmann, San Francisco, CA, 329--338.]] Google ScholarDigital Library
- Antoshenkov, G. 1994. Byte-aligned bitmap compression. Tech. rep. Oracle Corp., Redwood Shores, CA. U.S. Patent number 5,363,098.]]Google Scholar
- Antoshenkov, G. and Ziauddin, M. 1996. Query processing and optimization in ORACLE RDB. VLDB J. 5, 229--237.]] Google ScholarDigital Library
- Chan, C.-Y. and Ioannidis, Y. E. 1998. Bitmap index design and evaluation. In Proceedings of the 1998 ACM SIGMOD: International Conference on Management of Data. ACM Press, New York, NY, 355--366.]] Google ScholarDigital Library
- Chan, C. Y. and Ioannidis, Y. E. 1999. An efficient bitmap encoding scheme for selection queries. In SIGMOD 1999, Proceedings of the ACM SIGMOD International Conference on Management of Data, June 1--3, 1999, Philadelphia, Pennsylvania, USA, A. Delis, C. Faloutsos, and S. Ghandeharizadeh, Eds. ACM Press, New York, NY, 215--226.]] Google ScholarDigital Library
- Chaudhuri, S. and Dayal, U. 1997. An overview of data warehousing and OLAP technology. ACM SIGMOD Rec. 26, 1 (Mar.), 65--74.]] Google ScholarDigital Library
- Comer, D. 1979. The ubiquitous B-tree. Comput. Surv. 11, 2, 121--137.]] Google ScholarDigital Library
- Czech, Z. J. and Majewski, B. S. 1993. A linear time algorithm for finding minimal perfect hash functions. Comput. J. 36, 6 (Dec.), 579--587.]]Google ScholarCross Ref
- Fox, E. A., Chen, Q. F., Daoud, A. M., and Heath, L. S. 1991. Order-preserving minimal perfect hash functions and information retrieval. ACM Trans. Inf. Syst. 9, 3, 281--308.]] Google ScholarDigital Library
- Furuse, K., Asada, K., and Iizawa, A. 1995. Implementation and performance evaluation of compressed bit-sliced signature files. In Information Systems and Data Management, 6th International Conference, CISMOD'95, Bombay, India, November 15--17, 1995, Proceedings, S. Bhalla, Ed. Lecture Notes in Computer Science, vol. 1006. Springer, Berlin, Germany, 164--177.]]Google Scholar
- Gailly, J. and Adler, M. 1998. zlib 1.1.3 manual. Source code available online at http://-www.info-zip.org/pub/-infozip/zlib.]]Google Scholar
- Ishikawa, Y., Kitagawa, H., and Ohbo, N. 1993. Evalution of signature files as set access facilities in OODBs. In Proceedings of the ACM SIGMOD International Conference on Management of Data, (Washington, DC May 26--28), P. Buneman and S. Jajodia, Eds. ACM Press, New York, NY, 247--256.]] Google ScholarDigital Library
- Johnson, T. 1999. Performance measurements of compressed bitmap indices. In VLDB'99, Proceedings of 25th International Conference on Very Large Data Bases, September 7--10, 1999, Edinburgh, Scotland, UK, M. P. Atkinson, M. E. Orlowska, P. Valduriez, S. B. Zdonik, and M. L. Brodie, Eds. Morgan Kaufmann, San Francisco, CA, 278--289. A longer version appeared as AT&T report number AMERICA112.]] Google ScholarDigital Library
- Jürgens, M. and Lenz, H.-J. 1999. Tree based indexes vs. bitmap indexes---a performance study. In Proceedings of the International Workshop on Design and Management of Data Warehouses, DMDW'99, Heidelberg, Germany, June 14-15, 1999, S. Gatziu, M. A. Jeusfeld, M. Staudt, and Y. Vassiliou, Eds.]]Google Scholar
- Knuth, D. E. 1998. The Art of Computer Programming, 2nd ed. Vol. 3. Addison Wesley, Reading, MA.]] Google ScholarDigital Library
- Koudas, N. 2000. Space efficient bitmap indexing. In Proceedings of the Ninth International Conference on Information Knowledge Management (CIKM 2000, November 6--11, McLean, VA). ACM Press, New York, NY, 194--201.]] Google ScholarDigital Library
- Lee, D. L., Kim, Y. M., and Patel, G. 1995. Efficient signature file methods for text retrieval. IEEE Trans. Knowl. Data Eng. 7, 3, 423--435.]] Google ScholarDigital Library
- Moffat, A. and Zobel, J. 1992. Parameterised compression for sparse bitmaps. In Proceedings of the ACM-SIGIR International Conference on Research and Development in Information Retrieval, (Copenhagen, Denmark, June), N. Belkin, P. Ingwersen, and A. M. Pejtersen, Eds. ACM Press, New York, NY, 274--285.]] Google ScholarDigital Library
- O'Neil, P. 1987. Model 204 architecture and performance. In 2nd International Workshop in High Performance Transaction Systems, Asilomar, CA. Lecture Notes in Computer Science, vol. 359. Springer-Verlag, Berlin, Germany, 40--59.]] Google ScholarDigital Library
- O'Neil, P. and O'Neil, E. 2000. Database: Principles, Programming, and Performance, 2nd ed. Morgan Kaugmann, San Francisco, CA.]] Google ScholarDigital Library
- O'Neil, P. and Quass, D. 1997. Improved query performance with variant indices. In Proceedings of the ACM SIGMOD International Conference on Management of Data (Tucson, AZ, May 13--15), J. Peckham, Ed. ACM Press, New York, NY, 38--49.]] Google ScholarDigital Library
- O'Neil, P. E. and Graefe, G. 1995. Multi-table joins through bitmapped join indices. SIGMOD Rec. 24, 3, 8--11.]] Google ScholarDigital Library
- Shoshani, A., Bernardo, L. M., Nordberg, H., Rotem, D., and Sim, A. 1999. Multidimensional indexing and query coordination for tertiary storage management. In Proceedings of the 11th International Conference on Scientific and Statistical Database Management (Cleveland, OH, 28--30 July). IEEE Computer Society Press, Los Alamitos, CA, 214--225.]] Google ScholarDigital Library
- Stockinger, K., Duellmann, D., Hoschek, W., and Schikuta, E. 2000. Improving the performance of high-energy physics analysis through bitmap indices. In Proceedings of the 11th International Conference on Database and Expert Systems Applications (DEXA 2000, London, Greenwich, UK).]] Google ScholarDigital Library
- Stockinger, K., Wu, K., and Shoshani, A. 2002. Strategies for processing ad hoc queries on large data warehouses. In Proceedings of DOLAP'02 (McLean, VA), 72--79. A draft appeared as Tech rep. LBNL-51791, Lawrence Berkeley National Laboratory, Berkeley, CA.]] Google ScholarDigital Library
- Wong, H. K. T., Liu, H.-F., Olken, F., Rotem, D., and Wong, L. 1985. Bit transposed files. In Proceedings of VLDB 85 (Stockholm, Sweden). 448--457.]]Google Scholar
- Wu, K.-L. and Yu, P. 1996. Range-based bitmap indexing for high cardinality attributes with skew. Tech. rep. RC 20449. IBM Watson Research Division, Yorktown Heights, NY.]]Google Scholar
- Wu, M.-C. and Buchmann, A. P. 1998. Encoded bitmap indexing for data warehouses. In Proceedings of the Fourteenth International Conference on Data Engineering (February 23--27, Orlando, FL). IEEE Computer Society, ACM Press, Los Alamitos, CA, 220--230.]] Google ScholarDigital Library
- Wu, K., Koegler, W., Chen, J., and Shoshani, A. 2003. Using bitmap index for interactive exploration of large datasets. In Proceedings of SSDBM 2003 (Cambridge, MA), 65--74. A draft appeared as Tech rep. LBNL-52535, Lawrence Berkeley National Laboratory, Berkeley, CA.]] Google ScholarDigital Library
- Wu, K., Otoo, E. J., and Shoshani, A. 2001a. A performance comparison of bitmap indexes. In Proceedings of the 2001 ACM CIKM International Conference on Information and Knowledge Management (Atlanta, GA, November 5--10). ACM Press, New York, NY, 559--561.]] Google ScholarDigital Library
- Wu, K., Otoo, E. J., and Shoshani, A. 2002. Compressing bitmap indexes for faster search operations. In Proceedings of SSDBM'02 (Edinburgh, Scotland), 99--108. Also published as Tech rep. LBNL-49627, Lawrence Berkeley National Laboratory, Berkeley, CA.]] Google ScholarDigital Library
- Wu, K., Otoo, E. J., and Shoshani, A. 2004. On the performance of bitmap indices for high-cardinality attributes. In Proceedings of the Thirtieth International Conference on Very Large Data Bases, Toronto, Canada, August 31-September 3, 2004, M. A. Nascimento, M. T. Özsu, D. Kossmann, R. J. Miller, J. A. Blakeley, and K. B. Schiefer, Eds. Morgan Kaufmann, San Francisco, CA, 24--35.]]Google Scholar
- Wu, K., Otoo, E. J., Shoshani, A., and Nordberg, H. 2001b. Notes on design and implementation of compressed bit vectors. Tech. rep. LBNL/PUB-3161. Lawrence Berkeley National Laboratory, Berkeley, CA.]]Google Scholar
- Ziv, J. and Lempel, A. 1977. A universal algorithm for sequential data compression. IEEE Trans. Inform. Theor. 23, 3, 337--343.]]Google ScholarDigital Library
Index Terms
- Optimizing bitmap indices with efficient compression
Recommendations
APPLE: a new compression scheme for bitmap indexes: poster abstract
SenSys '20: Proceedings of the 18th Conference on Embedded Networked Sensor SystemsCompressed bitmap indexes are increasingly used in databases and search engines. By exploiting bit-level parallelism and bitwise operations, e.g. AND/OR operations, they can significantly accelerate the development of many areas. The Word Aligned Hybrid ...
Optimizing candidate check costs for bitmap indices
CIKM '05: Proceedings of the 14th ACM international conference on Information and knowledge managementIn this paper, we propose a new strategy for optimizing the placement of bin boundaries to minimize the cost of query evaluation using bitmap indices with binning. For attributes with a large number of distinct values, often the most efficient index ...
Investigating design choices between Bitmap index and B-tree index for a large data warehouse system
ACS'08: Proceedings of the 8th conference on Applied computer scinceBuilding indexes on database is common, but it has an important impact on the query performance, especially in large databases such as a Data Warehouse where the queries are usually very complex and ad hoc. If a proper index structure is chosen, the ...
Comments