skip to main content
10.1145/1142351.1142385acmconferencesArticle/Chapter ViewAbstractPublication PagespodsConference Proceedingsconference-collections
Article

Cache-oblivious string B-trees

Published:26 June 2006Publication History

ABSTRACT

B-trees are the data structure of choice for maintaining searchable data on disk. However, B-trees perform suboptimally

  • when keys are long or of variable length,

  • when keys are compressed, even when using front compression, the standard B-tree compression scheme,

  • for range queries, and

  • with respect to memory effects such as disk prefetching.

This paper presents a cache-oblivious string B-tree (COSB-tree) data structure that is efficient in all these ways:

  • The COSB-tree searches asymptotically optimally and inserts and deletes nearly optimally.

  • It maintains an index whose size is proportional to the front-compressed size of the dictionary. Furthermore, unlike standard front-compressed strings, keys can be decompressed in a memory-efficient manner.

  • It performs range queries with no extra disk seeks; in contrast, B-trees incur disk seeks when skipping from leaf block to leaf block.

  • It utilizes all levels of a memory hierarchy efficiently and makes good use of disk locality by using cache-oblivious layout strategies.

References

  1. A. Aggarwal and J. S. Vitter. The input/output complexity of sorting and related problems. Commun. ACM, 31(9):1116--1127, Sept. 1988.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. Alstrup, M. A. Bender, E. D. Demaine, M. Farach-Colton, T. Rauhe, and M. Thorup. Efficient tree layout in a multilevel memory hierarchy. arXiv:cs.DS/0211010, 2004. http://www.arXiv.org/pdf/cs.DS/0211010.]]Google ScholarGoogle Scholar
  3. A. Amir, M. Farach, and Y. Matias. Efficient randomized dictionary matching algorithms (extended abstract). In Proc. 3rd Symp. on Combinatorial Pattern Matching (CPM), pp. 262--275, Tucson, Arizona, Apr. 1992.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. Bayer and E. M. McCreight. Organization and maintenance of large ordered indexes. Acta Inf., 1(3):173--189, Feb. 1972.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. R. Bayer and K. Unterauer. Prefix B-trees. ACM Trans. Database Syst., 2(1):11--26, 1977.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. A. Bender, R. Cole, E. Demaine, M. Farach-Colton, and J. Zito. Two simplified algorithms for maintaining order in a list. In Proc. 10th European Symp. on Algorithms (ESA), pp. 152--164, Rome, Italy, Sept. 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. A. Bender, R. Cole, E. D. Demaine, and M. Farach-Colton. Scanning and traversing: Maintaining data for traversals in a memory hierarchy. In Proc. 10th Annual European Symp. on Algorithms (ESA), pp. 139--151, Rome, Italy, Sept. 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. A. Bender, E. Demaine, and M. Farach-Colton. Cache-oblivious B-trees. In Proc. 41st Annual Symp. on Foundations of Computer Science (FOCS), pp. 399--409, Redondo Beach, California, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. A. Bender, E. Demaine, and M. Farach-Colton. Cache-oblivious B-trees. SIAM J. Comput., 35(2):341--358, 2005.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. A. Bender, Z. Duan, J. Iacono, and J. Wu. A locality-preserving cache-oblivious dynamic dictionary. In Proc. 13th Annual Symp. on Discrete Mathematics (SODA), pp. 29--38, San Francisco, California, Jan. 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. A. Bender, Z. Duan, J. Iacono, and J. Wu. A locality-preserving cacheoblivious dynamic dictionary. J. Algorithms, 3(2):115--136, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. G. S. Brodal and R. Fagerberg. Cache-oblivious string dictionaries. In Proc. 17th Annual Symposium on Discrete Algorithm (SODA), pp. 581--590, Miami, Florida, Jan. 2006.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. G. S. Brodal, R. Fagerberg, and R. Jacob. Cache oblivious search trees via binary trees of small height. In Proc. 13th Annual Symposium on Discrete Algorithms (SODA), pp. 39--48, San Francisco, California, Jan. 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. H. K. Chang. Compressed indexing method. IBM Technical Disclosure Bulletin, 11(11):1400--1401, 1969.]]Google ScholarGoogle Scholar
  15. W. A. Clark IV, K. A. Salmond, and T. A. Stafford. Method and means for generating compressed keys. US Patent 3,593,309, 3 Jan. 1969.]]Google ScholarGoogle Scholar
  16. D. Comer. The ubiquitous B-tree. ACM Comput. Surv., 11(2):121--137, June 1979.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. E. D. Demaine, J. Iacono, and S. Langerman. Worst-case optimal tree layout in a memory hierarchy. Manuscript. arXiv:DS/0410048, 2004. http://john2.poly.edu/papers/manu04/paper.pdf.]]Google ScholarGoogle Scholar
  18. P. F. Dietz and D. D. Sleator. Two algorithms for maintaining order in a list. In Proc. 19th Annual Symposium on Theory of Computing (STOC), pp. 365--372, 1987.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. P. Ferragina and R. Grossi. The String B-Tree: A new data structure for string search in external memory and its applications. J. ACM, 46(2):236--280, Mar. 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. L. Fredman and D. E. Willard. Surpassing the information theoretic bound with fusion trees. J. Comput. Syst. Science, 47:424--436, 1994.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. Frigo, C. E. Leiserson, H. Prokop, and S. Ramachandran. Cache-oblivious algorithms. In Proc. 40th Annual Symp. on Foundations of Computer Science (FOCS), pp. 285--297, New York, New York, Oct. 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. Gil and A. Itai. How to pack trees. J. Algorithms, 32(2):108--132, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. A. Itai, A. G. Konheim, and M. Rodeh. A sparse table implementation of priority queues. In Proc. 8th Internationl Colloquium on Automata, Languages, and Programming (ICALP), vol. 115, pp. 417--431, Acre (Akko), Israel, July 1981.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. R. M. Karp and M. O. Rabin. Efficient randomized pattern-matching algorithms. IBM J. Res. Dev., 31(2):249--260, Mar. 1987.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. I. Katriel. Implicit data structures based on local reorganizations. Master's thesis, Technion, Israel Inst. of Tech., Haifa, May 2002.]]Google ScholarGoogle Scholar
  26. D. E. Knuth. Sorting and Searching, vol. 3 of The Art of Computer Programming. Addison-Wesley, Reading, Massachusetts, 1973.]]Google ScholarGoogle Scholar
  27. K. Mehlhorn. Data Structures and Algorithms 1: Sorting and Searching, pp. 198--199. Springer-Verlag, Berlin, 1984.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. M. Naor. String matching with preprocessing of text and pattern. In Proc. 18th International Colloquium on Automata, Languages and Programming (ICALP), pp. 739--750, 1991.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. H. Prokop. Cache-oblivious algorithms. Master's thesis, Department of Electrical Engineering and Computer Science, Massachusetts Inst. of Tech., June 1999.]]Google ScholarGoogle Scholar
  30. Sleepycat Software. The Berkeley Database. http://www.sleepycat.com, 2005.]]Google ScholarGoogle Scholar
  31. K. A. Smith and M. I. Seltzer. File system aging - increasing the relevance of file system benchmarks. In Proc. 1997 ACM SIGMETRICS Conf. on Measurement and Modeling of Computer Systems, pp. 203--213, Seattle, Washington, 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. R. E. Wagner. Indexing design considerations. IBM Syst. J., 12(4):351--367, 1973.]]Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Cache-oblivious string B-trees

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            PODS '06: Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
            June 2006
            382 pages
            ISBN:1595933182
            DOI:10.1145/1142351

            Copyright © 2006 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 26 June 2006

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • Article

            Acceptance Rates

            PODS '06 Paper Acceptance Rate35of185submissions,19%Overall Acceptance Rate642of2,707submissions,24%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader