ABSTRACT
B-trees are the data structure of choice for maintaining searchable data on disk. However, B-trees perform suboptimally
when keys are long or of variable length,
when keys are compressed, even when using front compression, the standard B-tree compression scheme,
for range queries, and
with respect to memory effects such as disk prefetching.
The COSB-tree searches asymptotically optimally and inserts and deletes nearly optimally.
It maintains an index whose size is proportional to the front-compressed size of the dictionary. Furthermore, unlike standard front-compressed strings, keys can be decompressed in a memory-efficient manner.
It performs range queries with no extra disk seeks; in contrast, B-trees incur disk seeks when skipping from leaf block to leaf block.
It utilizes all levels of a memory hierarchy efficiently and makes good use of disk locality by using cache-oblivious layout strategies.
- A. Aggarwal and J. S. Vitter. The input/output complexity of sorting and related problems. Commun. ACM, 31(9):1116--1127, Sept. 1988.]] Google ScholarDigital Library
- S. Alstrup, M. A. Bender, E. D. Demaine, M. Farach-Colton, T. Rauhe, and M. Thorup. Efficient tree layout in a multilevel memory hierarchy. arXiv:cs.DS/0211010, 2004. http://www.arXiv.org/pdf/cs.DS/0211010.]]Google Scholar
- A. Amir, M. Farach, and Y. Matias. Efficient randomized dictionary matching algorithms (extended abstract). In Proc. 3rd Symp. on Combinatorial Pattern Matching (CPM), pp. 262--275, Tucson, Arizona, Apr. 1992.]] Google ScholarDigital Library
- R. Bayer and E. M. McCreight. Organization and maintenance of large ordered indexes. Acta Inf., 1(3):173--189, Feb. 1972.]]Google ScholarDigital Library
- R. Bayer and K. Unterauer. Prefix B-trees. ACM Trans. Database Syst., 2(1):11--26, 1977.]] Google ScholarDigital Library
- M. A. Bender, R. Cole, E. Demaine, M. Farach-Colton, and J. Zito. Two simplified algorithms for maintaining order in a list. In Proc. 10th European Symp. on Algorithms (ESA), pp. 152--164, Rome, Italy, Sept. 2002.]] Google ScholarDigital Library
- M. A. Bender, R. Cole, E. D. Demaine, and M. Farach-Colton. Scanning and traversing: Maintaining data for traversals in a memory hierarchy. In Proc. 10th Annual European Symp. on Algorithms (ESA), pp. 139--151, Rome, Italy, Sept. 2002.]] Google ScholarDigital Library
- M. A. Bender, E. Demaine, and M. Farach-Colton. Cache-oblivious B-trees. In Proc. 41st Annual Symp. on Foundations of Computer Science (FOCS), pp. 399--409, Redondo Beach, California, 2000.]] Google ScholarDigital Library
- M. A. Bender, E. Demaine, and M. Farach-Colton. Cache-oblivious B-trees. SIAM J. Comput., 35(2):341--358, 2005.]] Google ScholarDigital Library
- M. A. Bender, Z. Duan, J. Iacono, and J. Wu. A locality-preserving cache-oblivious dynamic dictionary. In Proc. 13th Annual Symp. on Discrete Mathematics (SODA), pp. 29--38, San Francisco, California, Jan. 2002.]] Google ScholarDigital Library
- M. A. Bender, Z. Duan, J. Iacono, and J. Wu. A locality-preserving cacheoblivious dynamic dictionary. J. Algorithms, 3(2):115--136, 2004.]] Google ScholarDigital Library
- G. S. Brodal and R. Fagerberg. Cache-oblivious string dictionaries. In Proc. 17th Annual Symposium on Discrete Algorithm (SODA), pp. 581--590, Miami, Florida, Jan. 2006.]] Google ScholarDigital Library
- G. S. Brodal, R. Fagerberg, and R. Jacob. Cache oblivious search trees via binary trees of small height. In Proc. 13th Annual Symposium on Discrete Algorithms (SODA), pp. 39--48, San Francisco, California, Jan. 2002.]] Google ScholarDigital Library
- H. K. Chang. Compressed indexing method. IBM Technical Disclosure Bulletin, 11(11):1400--1401, 1969.]]Google Scholar
- W. A. Clark IV, K. A. Salmond, and T. A. Stafford. Method and means for generating compressed keys. US Patent 3,593,309, 3 Jan. 1969.]]Google Scholar
- D. Comer. The ubiquitous B-tree. ACM Comput. Surv., 11(2):121--137, June 1979.]] Google ScholarDigital Library
- E. D. Demaine, J. Iacono, and S. Langerman. Worst-case optimal tree layout in a memory hierarchy. Manuscript. arXiv:DS/0410048, 2004. http://john2.poly.edu/papers/manu04/paper.pdf.]]Google Scholar
- P. F. Dietz and D. D. Sleator. Two algorithms for maintaining order in a list. In Proc. 19th Annual Symposium on Theory of Computing (STOC), pp. 365--372, 1987.]] Google ScholarDigital Library
- P. Ferragina and R. Grossi. The String B-Tree: A new data structure for string search in external memory and its applications. J. ACM, 46(2):236--280, Mar. 1999.]] Google ScholarDigital Library
- M. L. Fredman and D. E. Willard. Surpassing the information theoretic bound with fusion trees. J. Comput. Syst. Science, 47:424--436, 1994.]] Google ScholarDigital Library
- M. Frigo, C. E. Leiserson, H. Prokop, and S. Ramachandran. Cache-oblivious algorithms. In Proc. 40th Annual Symp. on Foundations of Computer Science (FOCS), pp. 285--297, New York, New York, Oct. 1999.]] Google ScholarDigital Library
- J. Gil and A. Itai. How to pack trees. J. Algorithms, 32(2):108--132, 1999.]] Google ScholarDigital Library
- A. Itai, A. G. Konheim, and M. Rodeh. A sparse table implementation of priority queues. In Proc. 8th Internationl Colloquium on Automata, Languages, and Programming (ICALP), vol. 115, pp. 417--431, Acre (Akko), Israel, July 1981.]] Google ScholarDigital Library
- R. M. Karp and M. O. Rabin. Efficient randomized pattern-matching algorithms. IBM J. Res. Dev., 31(2):249--260, Mar. 1987.]] Google ScholarDigital Library
- I. Katriel. Implicit data structures based on local reorganizations. Master's thesis, Technion, Israel Inst. of Tech., Haifa, May 2002.]]Google Scholar
- D. E. Knuth. Sorting and Searching, vol. 3 of The Art of Computer Programming. Addison-Wesley, Reading, Massachusetts, 1973.]]Google Scholar
- K. Mehlhorn. Data Structures and Algorithms 1: Sorting and Searching, pp. 198--199. Springer-Verlag, Berlin, 1984.]] Google ScholarDigital Library
- M. Naor. String matching with preprocessing of text and pattern. In Proc. 18th International Colloquium on Automata, Languages and Programming (ICALP), pp. 739--750, 1991.]] Google ScholarDigital Library
- H. Prokop. Cache-oblivious algorithms. Master's thesis, Department of Electrical Engineering and Computer Science, Massachusetts Inst. of Tech., June 1999.]]Google Scholar
- Sleepycat Software. The Berkeley Database. http://www.sleepycat.com, 2005.]]Google Scholar
- K. A. Smith and M. I. Seltzer. File system aging - increasing the relevance of file system benchmarks. In Proc. 1997 ACM SIGMETRICS Conf. on Measurement and Modeling of Computer Systems, pp. 203--213, Seattle, Washington, 1997.]] Google ScholarDigital Library
- R. E. Wagner. Indexing design considerations. IBM Syst. J., 12(4):351--367, 1973.]]Google ScholarDigital Library
Index Terms
- Cache-oblivious string B-trees
Recommendations
Concurrent cache-oblivious b-trees
SPAA '05: Proceedings of the seventeenth annual ACM symposium on Parallelism in algorithms and architecturesThis paper presents concurrent cache-oblivious (CO) B-trees. We extend the cache-oblivious model to a parallel or distributed setting and present three concurrent CO B-trees. Our first data structure is a concurrent lock-based exponential CO B-tree. ...
Prefix B-trees
Two modifications of B-trees are described, simple prefix B-trees and prefix B-trees. Both store only parts of keys, namely prefixes, in the index part of a B*-tree. In simple prefix B-trees those prefixes are selected carefully to minimize their ...
Cache-oblivious streaming B-trees
SPAA '07: Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architecturesA streaming B-tree is a dictionary that efficiently implements insertions and range queries. We present two cache-oblivious streaming B-trees, the shuttle tree, and the cache-oblivious lookahead array (COLA).
For block-transfer size B and on N elements, ...
Comments